Methods and apparatuses for encoding and decoding video using multiple reference pictures

ABSTRACT

A method of encoding video using a plurality of reference pictures is provided. The method includes: writing a parameter indicating a temporal level of a reference picture or a period of a type of a reference picture into the reference picture for each of the plurality of reference pictures, creating a first list of reference pictures comprising the plurality of reference pictures sorted based on the parameter; and encoding a current picture of the video using at least the first list of reference pictures. A method of decoding video using a plurality of reference pictures is also provided. The method includes: parsing a parameter indicating a temporal level of a reference picture or a period of a type of a reference picture from each of the plurality of reference pictures; creating a first list of reference pictures comprising the plurality of reference pictures sorted based on the parameter; and decoding a current picture of the video using at least the first list of reference pictures. In addition, there are provided corresponding apparatuses for encoding and decoding video.

TECHNICAL FIELD

The present invention relates to methods of encoding and decoding videousing a plurality of reference pictures, and apparatuses thereof, andmore particularly, for inter picture prediction.

BACKGROUND ART

State-of-the-art video coding schemes, such as MPEG-4 AVC/H.264, and theupcoming HEVC (High-Efficiency Video Coding), support inter-pictureprediction utilizing motion-compensated prediction from more than onereference pictures. These schemes also support a special type ofbi-direction inter-picture prediction where both directions are pointingto the same direction in time. FIG. 1 shows an example of such forwardbi-predictive inter-picture prediction. In the case where there are morethan one reference frames/pictures, two lists of reference pictures arecreated for bi-predictive inter-picture prediction, and the referencepictures that are temporally closer to the current picture are sorted tothe top of the lists by a predefined scheme.

It is against this background that the present invention has beendeveloped.

CITATION LIST Non Patent Literature

NPL 1: ISO/IEC 14496-10, “MPEG-4 Part 10 Advanced Video Coding”

SUMMARY OF INVENTION Technical Problem

A problem with the prior art is that reference pictures closest to thecurrent picture are always or typically sorted to the top of the lists.However, the closest reference frames to the current picture may notalways be the best reference pictures to be used for forwardbi-predictive inter-picture prediction.

Solution to Problem

According to embodiment(s) of the present invention, methods ofencoding/decoding video are provided to solve or at least mitigate theproblem associated with the prior art described hereinbefore. Forexample, the methods allow for inter-picture prediction using tworeference lists where one of the lists is ordered based on a temporallevel or a period of a type of the reference pictures/frames.

By way of example, according to embodiment(s) of the present invention,when forward bi-predictive inter-picture prediction is used, tworeference lists are created where one of the reference lists is orderedbased on the temporal level of the reference pictures while the otherreference list is ordered based on the nearest temporal distance to thecurrent picture. Predefined reference picture lists constructed usingembodiment(s) of the present invention allow hierarchical codingstructure to be performed efficiently minimizing the bits for referencelist reordering signals.

Benefits of hierarchical coding structure include improved codingefficiency and better picture quality. By way of example, in ahierarchical coding structure according to embodiment(s) of the presentinvention, pictures are arranged into temporal levels where the lowestlevel represents a lowest frame rate and inclusion of subsequent higherlevels represents higher frame rates. Examples of the hierarchicalcoding structure are shown in FIG. 2. According to embodiment(s) of thepresent invention, a certain amount of coding gain can be achieved bycoding pictures at lower temporal levels with better quality (forexample by applying less quantization) than pictures at higher temporallevels. In HEVC, temporal levels are indicated by means of the syntaxparameter temporal_id located in the NAL (Network Abstraction Layer)unit header of a coded-slice of a picture.

According to a first aspect of the present invention, there is provideda method of encoding video using a plurality of reference pictures, themethod comprising:

writing a parameter indicating a temporal level of a reference pictureor a period of a type of a reference picture into the reference picturefor each of the plurality of reference pictures;

creating a first list of reference pictures comprising the plurality ofreference pictures sorted based on the parameter; and

encoding a current picture of the video using at least the first list ofreference pictures.

The step of encoding the current picture may comprise performing motionestimation and motion prediction for the current picture using at leastthe first list of reference pictures.

The step of creating the first list may further comprise sorting thereference pictures having the same temporal level based on theirtemporal distance to the current picture.

The step of creating the first list may comprise:

selecting a first group of reference pictures including one or morereference pictures of the plurality of reference pictures having atemporal level equal to a predetermined value;

selecting a second group of reference pictures including referencepictures of the plurality of reference pictures having a temporal levelnot equal to the predetermined value; and

positioning the first group of reference pictures above the second groupof reference pictures within the first list.

The method may further comprise:

creating a second list of reference pictures sorted based on thetemporal distance of the reference pictures to the current picture; and

creating a third list of reference pictures sorted based on the temporaldistance of the reference pictures to the current picture.

The method may further comprise:

determining whether the second list matches the third list; and

if the second list matches the third list, the first list is created bybeing sorted based on the parameter and the current picture is encodedusing at least the first list of reference pictures, and

if the second list does not match the third list, the current picture isencoded using the second and third lists of reference pictures.

The method may further comprise:

writing a flag into a header of the current picture; and

determining whether the flag is of a predefined value,

if the flag is of the predefined value, the first list is created bybeing sorted based on the parameter and the current picture is encodedusing at least the first list of reference pictures;

if the flag is not of the predefined value, the first list is created bybeing sorted based on at least a prediction dependency of the referencepictures, and the current picture is encoded using at least the firstlist of reference pictures.

The first list may be created by modifying either the second list or thethird list using the parameter.

According to a second aspect of the present invention, there is provideda method of decoding video using a plurality of reference pictures, themethod comprising:

parsing a parameter indicating a temporal level of a reference pictureor a period of a type of a reference picture from each of the pluralityof reference pictures;

creating a first list of reference pictures comprising the plurality ofreference pictures sorted based on the parameter; and

decoding a current picture of the video using at least the first list ofreference pictures.

The step of decoding the current picture may comprise performing motionprediction for the current picture using at least the first list ofreference pictures.

The step of creating the first list may further comprise sorting thereference pictures having the same temporal level based on theirtemporal distance to the current picture.

The step of creating the first list may comprise:

-   -   selecting a first group of reference pictures including one or        more reference pictures of the plurality of reference pictures        having a temporal level equal to a predetermined value;    -   selecting a second group of reference pictures including        reference pictures of the plurality of reference pictures having        a temporal level not equal to the predetermined value; and    -   positioning the first group of reference pictures above the        second group of reference pictures within the first list.

The method may further comprise:

creating a second list of reference pictures sorted based on thetemporal distance of the reference pictures to the current picture; and

creating a third list of reference pictures sorted based on the temporaldistance of the reference pictures to the current picture.

The method may further comprise:

determining whether the second list matches the third list; and

if the second list matches the third list, the first list is created bybeing sorted based on the parameter and the current picture is decodedusing at least the first list of reference pictures, and

if the second list does not match the third list, the current picture isdecoded using the second and third lists of reference pictures.

The method may further comprise:

parsing a flag from a header of the current picture; and

determining whether the flag is of a predefined value,

if the flag is of the predefined value, the first list is created bybeing sorted based on the parameter and the current picture is decodedusing at least the first list of reference pictures;

if the flag is not of the predefined value, the first list is created bybeing sorted based on at least a prediction dependency of the referencepictures, and the current picture is decoded using at least the firstlist of reference pictures.

The first list may be created by modifying either the second list or thethird list using the parameter.

According to a third aspect of the present invention, there is providedan apparatus for encoding video using a plurality of reference pictures,the apparatus comprising:

a writing unit configured to write a parameter indicating a temporallevel of a reference picture or a period of a type of a referencepicture into the reference picture for each of the plurality ofreference pictures;

a first list creation unit configured to create a first list ofreference pictures comprising the plurality of reference pictures sortedbased on the parameter; and

an encoding section configured to encode a current picture of the videousing at least the first list of reference pictures.

According to a fourth aspect of the present invention, there is providedan apparatus for decoding video using a plurality of reference pictures,the apparatus comprising:

a parsing unit configured for parsing a parameter indicating a temporallevel of a reference picture or a period of a type of a referencepicture from each of the plurality of reference pictures;

a first list creation unit configured for creating a first list ofreference pictures comprising the plurality of reference pictures sortedbased on the parameter; and

a decoding unit configured for decoding a current picture of the videousing at least the first list of reference pictures.

According to a fifth aspect of the present invention, there is provideda method of encoding video using a plurality of reference pictures, themethod comprising:

selecting a technique among a plurality of predetermined techniques forconstructing a list of reference pictures;

encoding a current picture using the list of reference picturesconstructed based on the selected technique; and

writing a parameter indicating the selected technique into the currentpicture.

According to a sixth aspect of the present invention, there is provideda method of decoding video using a plurality of reference pictures, themethod comprising:

parsing a parameter indicating a selected technique among a plurality ofpredetermined techniques for constructing a list of reference pictures;and

decoding a current picture using the list of reference picturesconstructed based on the selected technique.

Advantageous Effects of Invention

The effect of the current invention is in the form of improvement incoding efficiency as current invention provides the two differentreference pictures lists to improve the picture quality with negligibleincrease in overhead information.

BRIEF DESCRIPTION OF DRAWINGS

Embodiments of the present invention will now be described, by way ofexample only, with reference to the accompanying drawings, in which:

FIG. 1 depicts a diagram illustrating an example of bi-predictiveinter-picture prediction with both predictions in the same direction;

FIG. 2 depicts a diagram illustrating an example of typicalcorrespondence between temporal level and quality of coded pictures;

FIG. 3 depicts a diagram illustrating examples of reference picturelists constructed according to the prior art and an embodiment of thepresent invention for comparison;

FIG. 4 depicts a flowchart illustrating a method of encoding videoaccording to a first embodiment of the present invention;

FIG. 5 depicts a flowchart illustrating a method of decoding videoaccording to the first embodiment of the present invention;

FIG. 6 depicts a block diagram illustrating an example apparatus forencoding video according to the first embodiment of the presentinvention;

FIG. 7 depicts a block diagram illustrating an example apparatus fordecoding video according to the first embodiment of the presentinvention;

FIG. 8 depicts a diagram showing exemplary locations of the parameterindicating the selected technique among a plurality of pre-determinedtechnique for reference picture list construction in a header of a codedvideo bitstream according to embodiments of the present invention;

FIG. 9 depicts a flowchart illustrating a method of encoding videoaccording to a second embodiment of the present invention;

FIG. 10 depicts a flowchart illustrating a method of decoding videoaccording to the second embodiment of the present invention;

FIG. 11A depicts a flowchart illustrating a method of encoding videoaccording to a third embodiment of the present invention;

FIG. 11B depicts a flowchart illustrating a method of encoding videoaccording to another embodiment of the present invention;

FIG. 12A depicts a flowchart illustrating a method of decoding videoaccording to the third embodiment of the present invention;

FIG. 12B depicts a flowchart illustrating a method of decoding videoaccording to another embodiment of the present invention;

FIG. 13 depicts a flowchart illustrating an embodiment of the presentinvention for sorting the third list of reference pictures;

FIG. 14 depicts a flowchart illustrating another embodiment of thepresent invention for sorting the third list of reference pictures;

FIG. 15 depicts a block diagram illustrating an example apparatus forencoding video according to the third embodiment of the presentinvention;

FIG. 16 depicts a block diagram illustrating an example apparatus fordecoding video according to the third embodiment of the presentinvention;

FIG. 17 depicts a diagram showing an example location of the parameterfor indicating the temporal level of a picture in a header of a codedslice of a coded picture;

FIG. 18A depicts a flowchart illustrating a method of encoding videoaccording to a fourth embodiment of the present invention;

FIG. 18B depicts a flowchart illustrating a method of encoding videoaccording to another embodiment of the present invention;

FIG. 19A depicts a flowchart illustrating a method of decoding videoaccording to the fourth embodiment of the present invention;

FIG. 19B depicts a flowchart illustrating a method of decoding videoaccording to the fourth embodiment of the present invention;

FIG. 20 depicts a flowchart illustrating yet another embodiment of thepresent invention for sorting the third list of reference pictures;

FIG. 21 depicts a block diagram illustrating an example apparatus forencoding video according to the fourth embodiment of the presentinvention;

FIG. 22 depicts a block diagram illustrating an example apparatus fordecoding video according to the fourth embodiment of the presentinvention;

FIG. 23 depicts a diagram showing exemplary locations of the parameterindicating the period of the type of pictures in a header of a codedvideo bitstream;

FIG. 24A depicts a flowchart illustrating a method of encoding videoaccording to a fifth embodiment of the present invention;

FIG. 24B depicts a flowchart illustrating a method of encoding videoaccording to another embodiment of the present invention;

FIG. 25A depicts a flowchart illustrating a method of decoding videoaccording to the fifth embodiment of the present invention;

FIG. 25B depicts a flowchart illustrating a method of decoding videoaccording to another embodiment of the present invention;

FIG. 26 depicts an overall configuration of a content providing systemfor implementing content distribution services according to anembodiment of the present invention;

FIG. 27 depicts an overall configuration of a digital broadcastingsystem according to an embodiment of the present invention;

FIG. 28 depicts a block diagram illustrating an example of aconfiguration of a television according to an embodiment of the presentinvention;

FIG. 29 depicts a block diagram illustrating an example of aconfiguration of an information reproducing/recording unit that readsand writes information from or on a recording medium that is an opticaldisk according to an embodiment of the present invention;

FIG. depicts a drawing showing an example of a configuration of arecording medium that is an optical disk according to an embodiment ofthe present invention;

FIG. 31A depicts a drawing illustrating an example of a cellular phone;

FIG. 31B depicts a block diagram showing an example of a configurationof the cellular phone according to an embodiment of the presentinvention;

FIG. 32 depicts a drawing showing a structure of multiplexed dataaccording to an embodiment of the present invention;

FIG. 33 depicts a drawing schematically illustrating how each of thestreams is multiplexed in multiplexed data according to an embodiment ofthe present invention;

FIG. 34 depicts a drawing illustrating how a video stream is stored in astream of PES packets in more detail according to an embodiment of thepresent invention;

FIG. 35 depicts a drawing showing a structure of TS packets and sourcepackets in the multiplexed data according to an embodiment of thepresent invention;

FIG. 36 depicts a drawing showing a data structure of a PMT according toan embodiment of the present invention;

FIG. 37 depicts a drawing showing an internal structure of multiplexeddata information according to an embodiment of the present invention;

FIG. 38 depicts a drawing showing an internal structure of streamattribute information according to an embodiment of the presentinvention;

FIG. 39 depicts drawing showing steps for identifying video dataaccording to an embodiment of the present invention;

FIG. 40 depicts a block diagram illustrating an example of aconfiguration of an integrated circuit for implementing the video codingmethod and the video decoding method according to each of Embodiments;

FIG. 41 depicts a drawing showing a configuration for switching betweendriving frequencies according to an embodiment of the present invention;

FIG. 42 depicts a drawing showing steps for identifying video data andswitching between driving frequencies according to an embodiment of thepresent invention;

FIG. 43 depicts a drawing showing an example of a look-up table in whichthe standards of video data are associated with the driving frequenciesaccording to an embodiment of the present invention;

FIG. 44A depicts a drawing showing an example of a configuration forsharing a module of a signal processing unit; and

FIG. 44B depicts a drawing showing another example of a configurationfor sharing a module of a signal processing unit according to anembodiment of the present invention.

DESCRIPTION OF EMBODIMENTS

According to exemplary embodiments of the present invention, there areprovided methods of encoding video using a plurality of referencepictures/frames, methods of decoding video using a plurality ofreference pictures, and apparatuses thereof.

FIG. 3 shows two examples (Example #1 and Example #2) of reference listconstruction according to an embodiment of the present invention incomparison with the reference lists constructed according to the priorart. In the prior art, the reference list construction method is basedon the temporal distance to the current picture. Therefore, in theexamples illustrated, two identical lists of reference pictures (thefirst list and second list in FIG. 3) would be created according to theprior art. In contrast, according to an embodiment of the presentinvention, at least one reference list (e.g., the third list in FIG. 3)is created whereby the reference pictures in the list are ordered/sortedbased on the temporal level of the reference pictures. Referring to FIG.3, in Example #1, the reference pictures are ordered as follow (startingfrom the top of the reference list): reference picture “0”, referencepicture “2”, reference picture “3”, and reference picture “1”. InExample #2, the reference pictures are ordered as follow (starting fromthe top of the reference list): reference picture “8”, reference picture“10”, reference picture “9”, and reference picture “7”. Theinter-picture predictive coding is then performed using at least thethird reference list. For example, the inter-picture predictive codingmay be performed using only the third list or the third and secondlists, instead of the first and second lists as taught in the prior art.

For consistency and clarity, unless otherwise specified, the temporallevel will be described in embodiment(s) of the present invention usinga convention where the lowest value of temporal level (e.g., value “0”)indicate the primary picture(s) at the lowest frame rate, and subsequenthigher values of temporal level (e.g., values “1”, “2” and “3”)respectively indicate the subsequent sets of pictures producing higher(e.g., two times) frame rates when added on top of the lower temporallevels. The same convention is used in recent video coding schemes suchas HEVC, H.264 MVC extension and H.264 SVC extension, in which temporallevel is indicated using the syntax parameter temporal_id. However, itwill be apparent to those skilled in the art that other convention canbe applied without departing from the scope of the present invention.For example, a greater value of temporal level can instead indicate alower frame rate and still serve the same purpose.

FIG. 4 shows a flowchart describing a process of encoding video using aplurality of reference pictures according to a first exemplaryembodiment of the present invention. In step 400, one of a plurality ofpre-determined techniques or methods for reference picture listconstruction is selected. For example, the plurality of predeterminedtechniques may include a first technique which constructs a referencepicture list based on the temporal distance of the reference pictures toa current picture. In this case, the temporal level of the referencepictures can be used for supporting the feature of temporal scalabilityin the coded video bitstream. The plurality of predetermined techniquesmay further include a second technique which constructs a referencepicture list based on the temporal level of the reference pictures. Asan example, according to the second technique, a list of referencepictures can be created whereby the reference pictures in the list aresorted by increasing temporal level (i.e., the reference picture(s)having the lowest temporal level are sorted to the top of the list).

In step 402, the current picture is encoded (e.g., motion-compensatedinter-prediction encoding) using the selected technique of referencepicture list construction. In step 404, a parameter or flag indicatingthe selected technique of reference picture list construction is writteninto a header of the picture (coded video bitstream). For example, theparameter values “0” and “1” may indicate that the first and secondtechniques, respectively.

FIG. 5 shows a flowchart describing a process of decoding video using aplurality of reference pictures according to the first exemplaryembodiment of the present invention. In step 500, a header of a currentpicture (coded video bitstream) is parsed to obtain a parameter or flagindicating the selected technique among the plurality of pre-determinedtechniques for reference picture list construction. In step 502, usingthe parameter to determine the selected technique, the current pictureis decoded (e.g., motion-compensated inter-prediction decoding) based onthe selected technique of reference picture list construction.Preferably, the same plurality of pre-determined techniques forreference picture list construction is present in the encoding processand the decoding process according to the first exemplary embodiment asdescribed hereinbefore.

FIG. 6 shows a block diagram illustrating an apparatus for encodingvideo according to the first exemplary embodiment of the presentinvention. It will be apparent to the person skilled in the art thatmodifications can be made to the example apparatus shown in FIG. 6 toimplement any one of the methods of encoding video disclosed herein orother methods without departing from the scope of the present invention.That is, the apparatus for encoding video according to the presentinvention is not limited to the components/elements, and theinterconnections thereof, as shown in FIG. 6 and can be modifiedaccordingly by the person skilled in the art for various purposes.

The exemplary apparatus for encoding video comprises a selecting unit700, a first switch unit 702, a first creating unit 704, a secondcreating unit 706, a second switch unit 708, an encoding unit 710, awriting unit 712, and a memory unit 714.

As shown in FIG. 6, the selecting unit 700 is configured to selectbetween two or more pre-determined techniques or methods for referencepicture list construction and output a parameter or flag indicating theselection D701. The selection parameter D701 is used by the first switchunit 702 for sending stored reference pictures D703 from the memory unit714 either to the first creating unit 704 or to the second creating unit706. The first creating unit 704 or the second creating unit 706 isconfigured to create reference picture lists according to the selectedpre-determined technique of reference picture list construction. Basedon selection parameter D701, the second switch unit 708 is configured tosend either the reference picture lists D707 produced by the firstcreating unit 704 or the reference picture lists D711 produced by thesecond creating unit 706 to the encoding unit 710. The encoding unit 710is configured to receive the reference picture lists D713, an originaluncompressed image D715 and stored reference pictures D703, then performencoding, for example, using motion-compensated inter-pictureprediction, and output a coded picture D717. The writing unit 712 isconfigured to write the coded picture D717 and the selection parameterD701 into a coded video bitstream D719.

FIG. 7 shows a block diagram illustrating an apparatus for decoding avideo according to the first exemplary embodiment of the presentinvention. It will be apparent to the person skilled in the art thatmodifications can be made to the example apparatus shown in FIG. 7 toimplement any one of the methods of decoding video disclosed herein orother methods without departing from the scope of the present invention.That is, the apparatus for decoding video according to the presentinvention is not limited to the components/elements, and theinterconnections thereof, as shown in FIG. 7 and can be modifiedaccordingly by the person skilled in the art for various purposes.

The apparatus comprises a parsing unit 800, a first switch unit 802, afirst creating unit 804, a second creating unit 806, a second switchunit 808, and a decoding unit 810.

As shown in FIG. 7, the parsing unit 800 is configured to parse a headerof a current picture (coded video bitstream) D801 to obtain a parameteror flag D803 indicating the selected pre-determined techniques forreference picture list construction. Based on the parsed parameter D803,the switch unit 802 is configured to send the stored reference picturesD801 either to the first creating unit D804 or to the second creatingunit D806, which creates reference picture lists according to theselected predetermined technique of reference picture list construction.Based on the parsed parameter D803, the second switch unit 808 isconfigured to send either the reference picture lists D807 from thefirst creating unit D804 or the reference picture lists D811 from thesecond creating unit 806 to the decoding unit 810. The decoding unit isconfigured to use the reference picture lists D813, coded videobitstream D801 and stored reference pictures D801, then performsdecoding, for example, using motion-compensated inter-picture predictionto produce a reconstructed picture D817.

FIG. 8 shows a diagram illustrating example locations of the parameteror flag for indicating the selected pre-determined technique forreference list construction in a coded video bitstream. For example, theparameter or flag can have a value of “0” for a first pre-determinedtechnique and a value of “1” for a second pre-determined technique.According to embodiments of the present invention, FIG. 8(a) shows alocation of the parameter in a sequence header of a compressed videobitstream, FIG. 8(b) shows a location of the parameter in a pictureheader of a compressed video bitstream, FIG. 8(c) shows a location ofthe parameter in a slice header of a compressed video bitstream, andFIG. 8(d) shows that the parameter can also be derived from apre-defined look-up table based on a profile parameter, a levelparameter, or both the profile and level parameters located in asequence header of a compressed video bitstream.

FIG. 9 depicts a flowchart illustrating a method of encoding video usinga plurality of reference pictures according to a second exemplaryembodiment of the present invention. As a first step 900, the methodcomprises writing a parameter indicating a temporal level of a referencepicture or a period of a type of reference picture into the referencepicture for each of the plurality of reference pictures. For example, asillustrated in FIG. 3, the parameter indicating the temporal level maybe a value such as “0”, “1” or “2”, etc., whereby the lowest value oftemporal level indicate the reference picture(s) at the lowest framerate as described hereinbefore. The parameter indicating a period of atype of a reference picture may be a value such as “1” or “2”, etc.,whereby the value indicates the interval which the particular type ofreference pictures (e.g., primary pictures) occurs periodically. By wayof example, if the parameter indicating the period of the type of thereference pictures is “4”, this indicates that that particular type ofreference pictures occur periodically every 4 frames in output order.

The method of encoding video further comprises a step 902 of creating afirst list of reference pictures comprising the plurality of referencepictures sorted based on the parameter. For example, in the case of theparameter being indicative of a temporal level of the reference picture,step 902 creates a first list of reference pictures sorted based on thetemporal level of the reference pictures such as ordering lowerreference pictures having lower temporal level so as to be at the top ofthe first list.

The method of encoding video further comprises a step 904 of encoding acurrent picture of the video using at least the first list of referencepictures. For example, encoding the current picture comprises performingmotion estimation and motion prediction for the current picture using atleast the first list of reference pictures.

FIG. 10 depicts a flowchart illustrating a method of decoding videousing a plurality of reference pictures according to the secondexemplary embodiment. As a first step 1002, the method comprises parsinga parameter indicating a temporal level of a reference picture or aperiod of a type of reference picture from each of the plurality ofreference picture. As described hereinbefore, by way of example, theparameter indicating the temporal level may be a value such as “0”, “1”or “2”, etc., whereby the lowest value of temporal level indicate thereference picture(s) at the lowest frame rate as described hereinbefore.The parameter indicating a period of a type of reference picture may bea value such as “1” or “2”, etc., whereby the value indicates thereoccurrence period (i.e., interval) of that particular type of thereference picture (e.g., primary pictures).

The method of decoding video further comprises a step 1004 of creating afirst list of reference pictures including the plurality of referencepictures sorted based on the parameter; and a step 1006 of decoding thecurrent picture of the video using at least the first list of referencepictures. For example, decoding the current picture comprises performingmotion prediction for the current picture using at least the first listof reference pictures.

An apparatus for encoding video using a plurality of reference picturesaccording to the second exemplary embodiment of the present inventioncomprises a writing unit, a first list creation unit, and an encodingsection. The writing unit is configured to write the parameter into eachof the plurality of reference pictures, the first list creation unit isconfigured to create the first list of reference pictures comprising theplurality of reference pictures sorted based on the parameter, and theencoding section is configured to encode the current picture of thevideo using at least the first list of reference pictures. For example,the encoding section may comprise a motion estimation unit configured toperform motion estimation for the current picture using at least thefirst list of reference pictures and a motion prediction unit configuredto perform motion prediction for the current picture using at least thefirst list of reference pictures.

An apparatus for decoding video using a plurality of reference picturesaccording to the second exemplary embodiment of the present inventioncomprises a parsing unit, a first list creation unit, and a decodingsection. The parsing unit is configured for parsing the parameter fromeach of the plurality of reference pictures, the first list creationunit is configured to create the first list of reference picturescomprising the plurality of reference pictures sorted based on theparameter, and the decoding section is configured to decode the currentpicture of the video using at least the first list of referencepictures. For example, the decoding section may comprise a motionprediction unit configured to perform motion prediction for the currentpicture using at least the first list of reference pictures.

The second exemplary embodiment of the present invention has been foundto provide an improvement in encoding/decoding efficiency. As discussedin the background, a problem with the prior art is that referencepictures closest to the current picture are always sorted to the top ofthe reference lists. In contrast, according to the second exemplaryembodiment of the present invention, a parameter indicating a temporallevel of a reference picture or a period of a type of the referencepicture is written into the reference pictures, and at least onereference list is created with reference pictures sorted based on theparameter. For example, the case where the parameter indicates thetemporal level of the reference picture is illustrated in FIG. 3.Referring to FIG. 3, based on prior art teaching, the reference list forPicture/Frame “4” in Example #1 would be created only based on thetemporal distance, that is, Picture “3” being closest to Picture “4”would be arranged at the top of the list, followed by Picture “2” andPicture “1”, and Picture “0” being at the bottom of the reference list.However, due to the higher temporal level of Picture “3”, Picture “3”may not be the best reference picture/frame to use for inter pictureprediction in respect of Picture “4” and thus it may not be ideal toarrange such a picture at the top of the list. In contrast, according tothe second exemplary embodiment, Picture “0” having been assigned aparameter with the lowest temporal level amongst Pictures “0” to “3”would be arranged at the top of the list, followed by Picture “2”,Picture “3” and Picture “1”. As a result, more suitable or appropriatereference picture(s) is arranged at the top of the list which istherefore represented with the least bits for use in inter pictureprediction. Accordingly, better encoding/decoding of video can beachieved according to the second exemplary embodiment of the presentinvention.

Further exemplary embodiments of the present invention will now bedescribed hereinafter with reference to Figures, providing more specificexamples of the second exemplary embodiments of the present invention.It will be appreciated to the person skilled in the art that theexemplary embodiments described hereinafter are merely provided by wayof examples and do not restrict the scope of the present invention.

FIG. 11A shows a flowchart illustrating a process or method of encodingvideo using multiple reference pictures according to a third exemplaryembodiment of the present invention. As shown in FIG. 11A, in step 1100,a parameter for indicating/classifying a temporal level of a referencepicture is written into a header of the reference picture (e.g., a codedslice of the reference picture). An example of the parameter classifyingthe temporal level is the syntax parameter temporal_id in HEVC videocoding scheme. In step 1102, a first list of reference pictures sortedbased on the temporal distance to a current picture is created. In step1104, a second list of reference pictures sorted also based on thetemporal distance to the current picture is created. Then in step 1106,a comparison is performed to determine or judge whether the first listmatches (e.g., is identical to) the second list.

If the first list matches the second list, a third list of referencepictures (e.g., corresponding to the first list of reference picturesdescribed in the second exemplary embodiment) sorted by using at leastthe temporal level of the reference pictures to the current picture iscreated. Exemplary embodiments of sorting/ordering the third list ofreference pictures by at least the temporal level of the referencepictures will be described later with reference to FIGS. 13 and 14.Next, a motion estimation process is performed for the current picture(e.g., blocks of image samples) using at least the third list ofreference pictures in step 1110 and a motion prediction process isperformed for the current picture using at least the third list ofreference pictures in step 1112. For example, the motion estimationprocess may be performed using only the third list, using the second andthird lists, or using the first and third lists.

In an embodiment of the present invention, the third list is created byreordering the reference pictures from either the first list or thesecond list using at least the temporal level of the reference picturesto the current picture. In such an embodiment, the third list ofreference pictures effectively represents a modified version of eitherthe first list or the second list of reference pictures.

If the first list does not match the second list in step 1106, a motionestimation process is performed for the current picture using the firstand second lists of reference pictures in step 1114 and a motionprediction process is performed for the current picture using the firstand second lists of reference pictures in step 1116.

In an embodiment, the logic at step 1106 may be switched. In particular,if the first list matches the second list, a motion estimation processis performed for the current picture using the first and second lists ofreference pictures in step 1114 and a motion prediction process isperformed for the current picture using the first and second lists ofreference pictures in step 1116. On the other hand, if the first listdoes not match the second list in step 1106, a third list of referencepictures (e.g., corresponding to the first list of reference picturesdescribed in the second exemplary embodiment) sorted by at least thetemporal level of the reference pictures is created in step 1108. Next,a motion estimation process is performed for the current picture (e.g.,a block of image samples) using at least the third list of referencepictures in step 1108 and a motion prediction process is performed forthe current picture using at least the third list of reference picturesin step 1112. For example, the motion estimation process and/or themotion prediction process may be performed using only the third list,using the second and third lists, or using the first and third lists.

Yet another embodiment is shown in FIG. 11B. In particular, steps 1106,1114 and 1116 as shown in FIG. 11A are omitted. Accordingly, after step1104, a third list of reference pictures (e.g., corresponding to thefirst list of reference pictures described in the second exemplaryembodiment) sorted by using at least the temporal level of the referencepictures to the current picture is created in step 1158. Next, a motionestimation process is performed for the current picture (e.g., blocks ofimage samples) using at least the third list of reference pictures instep 1160 and a motion prediction process is performed for the currentpicture using at least the third list of reference pictures in step1162. For example, the motion estimation process and/or the second andthird lists of reference pictures, or using the first and third lists ofreference pictures.

FIG. 12A shows a flowchart illustrating a process or method of decodingvideo using multiple reference pictures according to the third exemplaryembodiment of the present invention. As shown in FIG. 12A, in step 1200,a parameter indicating the temporal level of the reference picture isparsed or determined from a header of the reference picture from each ofthe multiple reference pictures. Next, in step 1202, a first list ofreference pictures sorted based on the temporal distance to a currentpicture is created. In step 1204, a second list of reference picturessorted also based on the temporal distance to the current picture iscreated. Then in step 1206, a comparison is performed to determine orjudge whether if the first list matches (e.g., is identical to) thesecond list.

If the first list matches the second list, a third list of referencepictures (e.g., corresponding to the first list of reference picturesdescribed in the second exemplary embodiment) sorted by at least thetemporal level of the reference pictures. Similarly, exemplaryembodiments of sorting/ordering the third list of reference pictures byat least the temporal level of the reference pictures will be describedlater with reference to FIGS. 13 and 14. Next, a motion predictionprocess is performed for the current picture (e.g., blocks of imagesamples) using at least the third list of reference pictures in step1210. For example, the motion prediction process may be performed usingthe second and third lists of reference pictures, or using the first andthird lists of reference pictures.

If the first list does not match the second list in step 1206, a motionprediction process is performed for the current picture using the firstand second lists of reference pictures in step 1212.

In an embodiment, the logic at step 1206 may be switched. In particular,if the first list matches the second list, a motion prediction processis performed for the current picture using the first and second lists ofreference pictures in step 1212. On the other hand, if the first listdoes not match the second list in step 1206, a third list of referencepictures (e.g., corresponding to the first list of reference picturesdescribed in the second exemplary embodiment) sorted at least based onthe temporal level of the reference pictures is created in step 1208.Next, a motion prediction process is performed for the current pictureusing at least the third list of reference pictures in step 1210. Forexample, the motion prediction process may be performed using only thethird list, using the second and third lists, or using the first andthird lists.

Yet another embodiment is shown in FIG. 12B. In particular, steps 1206and 1212 as shown in FIG. 12A are omitted. Accordingly, after step 1204,a third list of reference pictures (e.g., corresponding to the firstlist of reference pictures described in the second exemplary embodiment)sorted by at least the temporal level of the reference pictures iscreated in step 1258. Next, a motion prediction process is performed forthe current picture (e.g., blocks of image samples) using at least thethird list of reference pictures in step 1260. For example, the motionprediction process may be performed using only the third list, using thesecond and third lists, or using the first and third lists.

FIG. 13 shows a flowchart illustrating an embodiment of the presentinvention for sorting/ordering the third list of reference pictures byat least the temporal level of the reference pictures. In step 1300, thereference pictures in the list are sorted in the order of increasingtemporal level. In step 1302, if two or more reference pictures have thesame temporal level, the two or more reference pictures are in turnsorted according to their respective temporal distance to the currentpicture. For example, in step 1302, reference pictures having the sametemporal level are sorted in the order of increasing temporal distanceto the current picture, such that the reference pictures with shortertemporal distance to the current picture are located higher (smallerreference index value) in the third list as compared to the referencepictures with longer temporal distance to the current picture. By way ofexample, in Example #1 shown in FIG. 3, Pictures “1” and “3” have thesame temporal level “2”. Accordingly, Pictures “1” and “3” are furthersorted based on their respective temporal distance to the currentpicture “4”. As a result, Picture “3” is arranged so as to be abovePicture “1” in the list.

FIG. 14 shows a flowchart illustrating another embodiment of the presentinvention for sorting/ordering the third list of reference pictures byat least the temporal level of the reference pictures. In step 1400, afirst group of reference pictures comprising one or more referencepictures having temporal level equal to a pre-determined value isselected. An example of the pre-determined value may be “0” indicatingthe lowest temporal level (corresponding to the lowest frame raterepresentation). Next, in step 1402, a second group of referencepictures comprising reference pictures that are not included in thefirst group of reference pictures is selected. In step 1404, the firstgroup of reference pictures is positioned/sorted at the top of the thirdlist whereby the reference pictures within the first group are sortedaccording to their respective temporal distance to the current picture.Next, in step 1406, the second group of reference pictures ispositioned/sorted after/below the first group of reference pictures inthe third list according to their respective temporal distance to thecurrent picture. For example, sorting the reference pictures within thefirst group and/or the second group are sorted in the order ofincreasing temporal distance to the current picture.

FIG. 15 shows a block diagram illustrating an example apparatus forencoding video according to the third exemplary embodiment of thepresent invention. It will be apparent to the person skilled in the artthat modifications can be made to the example apparatus shown in FIG. 15to implement any one of the methods of encoding video disclosed herein(e.g., the method as shown in FIG. 9) or other methods without departingfrom the scope of the present invention. That is, the apparatus forencoding video according to the present invention is not limited to thecomponents/elements, and the interconnections thereof, as shown in FIG.15 and can be modified accordingly by the person skilled in the art forvarious purposes.

The exemplary apparatus for encoding video comprises a motion estimationunit 1500, a motion prediction unit 1502, a first list creation unit1504, a second list creation unit 1516, a third list creation unit 1510,a first switch unit 1506, a second switch unit 1508, a memory unit 1512,a comparator unit or a determining unit 1514 and a writing unit 1518.

As shown in FIG. 15, the motion estimation unit 1500 is configured oroperable to read a current picture (e.g., a block of image samples)D1501, a selected list of reference pictures D1511 and a second list ofreference pictures D1519, and output a set of motion vectors D1503. Themotion prediction unit 1502 is configured to read the set of motionvectors D1503, the selected list of reference pictures D1511 and thesecond list of reference pictures D1519, and output a block of predictedsamples D1505. The first list creation unit 1504 is configured to readreference pictures D1513 from the memory unit 1512 and output a firstlist of reference pictures D1515. The second list creation unit 1516 isconfigured to read reference pictures D1517 from the memory unit 1512and output a second list of reference pictures D1519. The comparatorunit 1514 is configured to read both the first list of referencepictures D1515 and the second list of reference pictures D1519, andoutput a control signal D1521 to control the first switch unit 1506 andthe second switch units 1508. The first switch unit 1504 is configuredto send the first list of reference pictures D1515 either to the secondswitch unit 1508 or to the third list creation unit 1510 based on thecontrol signal D1521. The third list creation unit 1510 is configured tocreate a third list of reference pictures D1523 based on the first listof reference pictures D1509 and temporal level identifier parameter ofthe reference pictures D1525 stored in the memory unit 1512. The secondswitch unit 1508 is configured to select either the first list ofreference pictures D1507 or the third list of reference pictures D1523based on the control signal D1521. The writing unit 1518 is configuredto read the temporal level of the current picture D1527 and write acorresponding temporal level identifier parameter into a header of thecurrent picture D1529.

FIG. 16 shows a block diagram illustrating an example apparatus fordecoding video according to the third exemplary embodiment of thepresent invention. Similarly, it will be apparent to the person skilledin the art that modifications can be made to the example apparatus shownin FIG. 16 to implement any one of the methods of decoding videodisclosed herein (e.g., the method as shown in FIG. 10) or other methodswithout departing from the scope of the present invention. That is, theapparatus for decoding video according to the present invention is notlimited to the components/elements, and the interconnections thereof, asshown in FIG. 16 and can be modified accordingly for various purposes.

The example apparatus for decoding video comprises a motion predictionunit 1600, a first list creation unit 1602, a second list creation unit1614, a third list creation unit 1608, a first switch unit 1604, asecond switch unit 1606, a memory unit 1610, a comparator unit or adetermining unit 1612 and a parsing unit 1616.

As shown in FIG. 16, the motion prediction unit 1600 is configured oroperable to read a decoded set of motion vectors D1601, a selected listof reference pictures D1609 and a second list of reference picturesD1617, and output a block of predicted samples D1603. The first listcreation unit 1602 is configured to read reference pictures D1611 fromthe memory unit 1610 and output a first list of reference picturesD1613. The second list creation unit 1614 is configured to readreference pictures D1615 from the memory unit 1610 and output a secondlist of reference pictures D1617. The comparator unit 1612 is configuredto read both the first list of reference pictures D1613 and the secondlist of reference pictures D1617, and output a control signal D1619 tocontrol the first switch unit 1604 and the second switch units 1606. Thefirst switch unit 1604 is configured to send the first list of referencepictures D1613 either to the second switch unit 1606 or to the thirdlist creation unit 1608 based on the control signal D1619. The thirdlist creation unit 1608 is operable to create a third list of referencepictures D1621 based on the first list of reference pictures D1607 andthe temporal level identifier parameter of the reference pictures D1623stored in the memory unit 1610. The second switch unit 1606 is operableto select either the first list of reference pictures D1605 or the thirdlist of reference pictures D1621 based on the control signal D1619. Theparsing unit 1616 is operable to parse a header of a coded picture D1625and output the parsed temporal level identifier parameter D1627 into thememory unit 1610.

FIG. 17 shows a diagram illustrating an example location of theparameter indicating the temporal level of a picture in a header of thepicture. In a coded video bitstream, each picture is represented in oneor more slice Network Abstraction Layer Unit (“NALU”). As illustrated inFIG. 12, the parameter for indicating/classifying the temporal level ofa picture can be located in the NALU header of slice NALU.

FIG. 18A shows a flowchart illustrating a process or method of encodingvideo using multiple reference pictures according to a fourth exemplaryembodiment of the present invention. As shown in FIG. 18A, in step 1800,a parameter representing or indicating a period of a type of a referencepicture is written into a header of the picture (coded video bitstream).The period of a type of a picture refers to the period in which thattype of picture will reoccur periodically, which can therefore also bereferred to as an interval of a type of a picture. For example, theparameter value may be “4”, which indicates that that particular type ofpictures (e.g., primary pictures) occurs periodically every 4 frames inoutput order. Next, in step 1802, a first list of reference picturessorted based on the temporal distance to a current picture is created.In step 1804, a second list of reference pictures sorted also based onthe temporal distance to the current picture is created. In step 1806, acomparison is performed to judge or determine whether the first listmatches (e.g., is identical to) the second list.

If the first list matches the second list, in step 1808, a third list ofreference pictures sorted at least based on the period of the type ofthe reference pictures. Sorting/ordering the third list of referencepictures at least based on the period of the type of reference picturescan be similarly performed as described hereinbefore with reference toFIGS. 13 and 14, but with the parameter for indicating the temporallevel of the reference pictures substituted with the parameter forindicating a period or interval of a type of picture. Next, a motionestimation process is performed for a current picture (e.g., a block ofimage samples) using at least the third list of reference pictures instep 1810 and a motion prediction process is performed for the currentpicture using at least the third list of reference pictures in step1812. For example, the motion estimation process and/or the motionprediction process may be performed for the current picture using onlythe third list, using the second and third lists, or using the first andthird lists.

In an embodiment of the present invention, the third list is created byreordering the reference pictures from either the first list or thesecond list using at least a period of a type of the reference picturesfor sorting reference pictures. In such an embodiment, the third list ofreference pictures effectively represents a modified version of eitherthe first list or the second list of reference pictures.

If the first list does not match the second list in step 1806, a motionestimation process is performed for the current picture using the firstand the second lists of reference pictures in step 1814 and a motionprediction process is performed for the current picture using the firstand second lists of reference pictures in step 1816.

In an embodiment, the logic at step 1806 may be switched. In particular,if the first list matches the second list, a motion estimation processis performed for the current picture using the first and second lists ofreference pictures in step 1814 and a motion prediction process isperformed for the current picture using the first and second lists ofreference pictures in step 1816. On the other hand, if the first listdoes not match the second list in step 1806, a third list of referencepictures (e.g., corresponding to the first list of reference picturesdescribed in the second exemplary embodiment) sorted by at least theperiod of the type of the reference pictures is created in step 1808.Next, a motion estimation process is performed for the current pictureusing at least the third list of reference pictures in step 1810 and amotion prediction process is performed for the current picture using atleast the third list of reference pictures in step 1812. For example,the motion estimation process and/or the motion prediction process maybe performed using the second and third lists of reference pictures, orusing the first and third lists of reference pictures.

Yet another embodiment is shown in FIG. 18B. In particular, steps 1806,1814 and 1816 as shown in FIG. 18A are omitted. Accordingly, after step1804, a third list of reference pictures (e.g., corresponding to thefirst list of reference pictures described in the second exemplaryembodiment) sorted by at least the period of the type of the referencepictures is created in step 1858. Next, a motion estimation process isperformed for the current picture (e.g., blocks of image samples) usingat least the third list of reference pictures in step 1860 and a motionprediction process is performed for the current picture using at leastthe third list of reference pictures in step 1862. For example, themotion estimation process may be performed using only the third list,using the second and third lists, or using the first and third lists.

FIG. 19A shows a flowchart illustrating a process or method for decodingvideo using multiple reference pictures according to the fourthembodiment of the present invention. As shown in FIG. 19A, in step 1900,a parameter indicating a period of a type of pictures (e.g., primarypictures) is parsed from a header of the picture (coded videobitstream). Next, in step 1902, a first list of reference picturessorted based on the temporal distance to a current picture is created.In step 1904, a second list of reference pictures sorted also based onthe temporal distance to the current picture is created. In step 1906, acomparison is performed to judge or determine whether the first listmatches the second list.

If the first list matches the second list, in step 1908, a third list ofreference pictures sorted at least based on a period of the type ofreference pictures. Sorting/ordering the third list of referencepictures at least based on a period of the type of reference picturescan be similarly performed as described hereinbefore with reference toFIGS. 13 and 14, but with the parameter for indicating the temporallevel of the reference pictures substituted with the parameter forindicating a period of a type of picture. Next, a motion predictionprocess is performed for the current picture using at least the thirdlist of reference pictures in step 1910. For example, the motionprediction process may be performed using only the third list, using thesecond and third lists, or using the first and third lists.

If the first list does not match the second list in step 1906, a motionprediction process is performed for the current picture using the firstand second lists of reference pictures in module 1912.

In an embodiment, the logic at step 1906 may be switched. In particular,if the first list matches the second list, a motion prediction processis performed for the current picture using the first and second lists ofreference pictures in step 1912. On the other hand, if the first listdoes not match the second list in step 1206, a third list of referencepictures (e.g., corresponding to the first list of reference picturesdescribed in the second exemplary embodiment) sorted based on at leastthe period of a type of the reference pictures is created in step 1908.Next, a motion prediction process is performed for the current pictureusing at least the third list of reference pictures in step 1910.Similarly, the motion prediction process may be performed using only thethird list, using the second and third lists, or using the first andthird lists.

Yet another embodiment is shown in FIG. 19B. In particular, steps 1906and 1912 as shown in FIG. 19A are omitted. Accordingly, after step 1904,a third list of reference pictures (e.g., corresponding to the firstlist of reference pictures described in the second exemplary embodiment)sorted by at least the period of the type of the reference pictures iscreated in step 1958. Next, a motion prediction process is performed forthe current picture (e.g., blocks of image samples) using at least thethird list of reference pictures in step 1960. For example, the motionprediction process may be performed using only the third list, using thesecond and third lists, or using the first and third lists.

FIG. 20 shows a flowchart illustrating another embodiment of the presentinvention for sorting the third list of reference pictures at leastbased on a period of the type of reference pictures. In step 2000, afirst group of reference pictures is selected comprising one or more keyreference pictures having picture numbers equal to integer multiples ofa period of a type of pictures (e.g., a period of the primary pictures).For example, if the period is 4, the first group of pictures include oneor more reference pictures having picture numbers equal to 0, 4, 8, 12,16 and so on. The term picture number refers to the output order ofcoded pictures. Next, in step 2002, a second group of reference picturescomprising reference pictures that are not included in the first groupof reference pictures. Then, in step 2004, the first group of referencepictures is positioned/sorted at the top of the third list whereby thereference pictures within the third list are sorted according totemporal distance to the current picture. Next, in step 2006, the secondgroup of reference pictures is positioned/sorted after/below the firstgroup of reference pictures in the third list whereby the referencepictures within the third list are sorted according to temporal distanceto the current picture. For example, the reference pictures within thefirst group and/or the second group are sorted in the order ofincreasing temporal distance to the current picture.

FIG. 21 shows a block diagram illustrating an example apparatus forencoding video according to the fourth exemplary embodiment of thepresent invention. It will be apparent to the person skilled in the artthat modifications can be made to the example apparatus shown in FIG. 21to implement any one of the methods of encoding video disclosed hereinor other methods without departing from the scope of the presentinvention. That is, the apparatus for encoding video according to thepresent invention is not limited to the components/elements, and theinterconnections thereof, as shown in FIG. 21 and can be modifiedaccordingly for various purposes.

The exemplary apparatus for encoding video comprises a motion estimationunit 2100, a motion prediction unit 2102, a first list creation unit2104, a second list creation unit 2116, a third list creation unit 2110,a first switch unit 2106, a second switch unit 2108, a memory unit 2112,a comparator unit 2114 and a writing unit 2118.

As shown FIG. 21, the motion estimation unit 2100 is configure to read ablock of image samples D2101, a selected list of reference picturesD2111 and a second list of reference pictures D2119, and output a set ofmotion vectors D2103. The motion prediction unit 2102 is configured toread the set of motion vectors D2103, the selected list of referencepictures D2111 and the second list of reference pictures D2119, andoutput a block of predicted samples D2105. The first list creation unit2104 is configured to read reference pictures D2113 from the memory unit2112 and output a first list of reference pictures D2115. The secondlist creation unit 2116 is configured to read reference pictures D2117from the memory unit 2112 and output a second list of reference picturesD2119. The comparator unit 2114 is configured to read both the firstlist of reference pictures D2115 and the second list of referencepictures D2119, and output a control signal D2121 to control the firstswitch unit 2106 and the second switch units 2108. The first switch unit2104 is configured to send the first list of reference pictures D2115either to the second switch unit 2108 or to the third list creation unit2110 based on the control signal D2121. The third list creation unit2110 is configured to create a third list of reference pictures D2123based on the first list of reference pictures D2109 and a period of thetype of reference pictures D2125 stored in the memory unit 2112. Thesecond switch unit 2108 is configured to select either the first list ofreference pictures D2107 or the third list of reference pictures D2123based on the control signal D2121. The writing unit 2118 is configuredto read the period of the type of reference pictures D2125 and write acorresponding parameter representing the period into a header of a codedpicture D2127.

FIG. 22 shows a block diagram illustrating an example apparatus fordecoding video according to the fourth exemplary embodiment of thepresent invention. Similarly, it will be apparent to the person skilledin the art that modifications can be made to the example apparatus shownin FIG. 22 to implement any one of the methods of decoding videodisclosed herein or other methods without departing from the scope ofthe present invention. That is, the apparatus for decoding videoaccording to the present invention is not limited to thecomponents/elements, and the interconnections thereof, as shown in FIG.22 and can be modified accordingly for various purposes

The exemplary apparatus for encoding video comprises a motion predictionunit 2200, a first list creation unit 2202, a second list creation unit2214, a third list creation unit 2208, a first switch unit 2204, asecond switch unit 2206, a memory unit 2210, a comparator unit or adetermining unit 2212 and a parsing unit 2216.

As shown in the FIG. 22, the motion prediction unit 2200 is configuredto read a decoded set of motion vectors D2201, a selected list ofreference pictures D2209 and a second list of reference pictures D2217,and output a block of predicted samples D2203. The first list creationunit 2202 is configured to read reference pictures D2211 from the memoryunit 2210 and output a first list of reference pictures D2213. Thesecond list creation unit 2214 is configured to read reference picturesD2215 from the memory unit 2210 and output a second list of referencepictures D2217. The comparator unit 2212 is configured to read both thefirst list of reference pictures D2213 and the second list of referencepictures D2217 and output a control signal D2219 to control the firstswitch unit 2204 and the second switch units 2206. The first switch unit2204 is configured to send the first list of reference pictures D2213either to the second switch unit 2206 or to the third list creation unit2208 based on the control signal D2219. The third list creation unit2208 is configured to create a third list of reference pictures D2221based on the first list of reference pictures D2207 and a parsed periodof primary pictures D2225. The second switch unit 2206 is configured toselect either the first list of reference pictures D2205 or the thirdlist of reference pictures D2221 based on the control signal D2219. Theparsing unit 2216 is configured to parse a header of a coded pictureD2223 and output the parsed period of primary pictures D2225.

FIG. 23 shows a diagram illustrating example locations of the parameterrepresenting/indicating the period of the type of reference pictures ina header of the picture (coded video bitstream) in the case where thetype of reference pictures is primary reference pictures. The period ofprimary pictures is used for indicating the periodic occurrence ofprimary (key) pictures which are typically coded with higher quality ascompared to other non-primary pictures. According to embodiments of thepresent invention, FIG. 23(a) shows an example location of the parameterin a sequence header of a compressed video bitstream, FIG. 23(b) showsan example location of the parameter in a picture header of a compressedvideo bitstream, FIG. 23(c) shows an example location of said parameterin a slice header of a compressed video bitstream, FIG. 23(d) shows thatthe parameter can also be derived from a predefined look-up table basedon a profile parameter, a level parameter, or both the profile and levelparameters located in a sequence header of a compressed video bitstream.

FIG. 24A shows a flowchart illustrating a process or method of encodingvideo according to a fifth embodiment of the present invention. As shownin FIG. 24A, in step 2400, a flag (e.g., a reordering techniqueselection flag) is first written or embedded into a header of a currentpicture. For example, the flag is used to indicate or signal one of thetwo or more different techniques used for the ordering the referencepictures in a reference picture list.

In step 2402, a first list of reference pictures sorted based on thetemporal distance to a current picture is created. Next in step 2404, asecond list of reference pictures sorted based on the temporal distanceto the current picture is created. And in step 2406, a comparison isperformed to determine or judge if the first list matches (e.g.,identical to) the second list.

If the first list matches the second list, a comparison is performed todetermine if the value of the flag has or is of a predefined value. Ifthe flag is of a predefined value, a third list of reference pictures(e.g., corresponding to the first list of reference pictures describedin the second exemplary embodiment), which is sorted based on thetemporal level of the reference pictures, is created in step 2414. Ifthe flag is not of a predefined value, a third list of referencepictures, which is sorted by prediction dependency of the referencepictures, is created in step 2420. The prediction dependency of thereference pictures refers to the dependency in inter-picture motioncompensated prediction among the reference frames. Next, a motionestimation process is performed for a current picture (e.g., a block ofimage samples) using at least the third list of reference pictures instep 2416 and a motion prediction process is performed for the block ofimage samples using at least the third list of reference pictures instep 2418. For example, the motion estimation process and/or the motionprediction process may be performed using the only the third list, usingthe second and third lists, or using the first and third lists.

If the first list does not match the second list in step 2406, a motionestimation process is performed for a current picture using the firstand second lists of reference frames in step 2408 and a motionprediction process is performed for the current picture using the firstand second lists of reference frames in step 2410.

In an embodiment, the logic at step 2406 may be switched. In particular,if the first list matches the second list, a motion estimation processis performed for a current picture using the first and second lists ofreference frames in step 2408 and a motion prediction process isperformed for the current picture using the first and second lists ofreference frames in step 2410. On the other hand, if the first list doesnot match the second list in step 2406, a comparison is performed todetermine or judge if the value of the flag has or is of a predefinedvalue. If the flag is of a predefined value, a third list of referencepictures (e.g., corresponding to the first list of reference picturesdescribed in the first exemplary embodiment), which is sorted based onthe temporal level of the reference pictures, is created in step 2414.If the flag is not of a predefined value, a third list of referencepictures, which is sorted by prediction dependency of the referencepictures, is created in step 2420. The prediction dependency of thereference pictures refers to the dependency in inter-picture motioncompensated prediction among the reference frames. Next, a motionestimation process is performed for a current picture (e.g., a block ofimage samples) using at least the third list of reference pictures instep 2416 and a motion prediction process is performed for the block ofimage samples using at least the third list of reference pictures instep 1318. For example, the motion estimation process and/or the motionprediction process may be performed using only the third list, using thesecond and third lists, or using the first and third lists.

Yet another embodiment is shown in FIG. 24B. In particular, steps 2406,2408 and 2410 shown in FIG. 24A are omitted. Accordingly, after step2404, a comparison is performed to determine or judge if the value ofthe flag has or is of a predefined value in step 2462. If the flag is ofa predefined value, a third list of reference pictures (e.g.,corresponding to the first list of reference pictures described in thesecond exemplary embodiment), which is sorted based on the temporallevel of the reference pictures, is created in step 2464. If the flag isnot of a predefined value, a third list of reference pictures, which issorted by prediction dependency of the reference pictures, is created instep 2470. The prediction dependency of the reference pictures refers tothe dependency in inter-picture motion compensated prediction among thereference frames. Next, a motion estimation process is performed for acurrent picture (e.g., a block of image samples) using at least thethird list of reference pictures in step 2466 and a motion predictionprocess is performed for the block of image samples using at least thefirst list of reference pictures in step 2468.

FIG. 25A shows a flowchart illustrating a process or method of decodingvideo according to the fifth exemplary embodiment of the presentinvention. As shown in FIG. 25A, in step 2500, a flag (e.g., areordering technique selection flag) is first parsed or retrieved from aheader of a current picture. For example, the flag indicates theselected technique for the reordering of the reference pictures.

In step 2502, a first list of reference pictures sorted by a firstscheme that uses temporal distance to a current picture is created. Nextin step 2504, a second list of reference pictures sorted by a secondscheme that also uses temporal distance to a current picture is created.And in step 2506, a comparison is performed to determine or judge if thefirst list matches (e.g., identical to) the second list.

If the first list matches the second list, a comparison is performed todetermine or judge if the value of the flag has or is of a predefinedvalue in step 2508. If the flag is of a predefined value, a third listof reference pictures (e.g., corresponding to the first list ofreference pictures described in the second exemplary embodiment), whichis sorted based on the temporal level of the reference pictures, iscreated in step 2512. If the flag is not of a predefined value, a thirdlist of reference pictures, which is sorted by prediction dependency ofthe reference pictures, is created in step 2516. The predictiondependency of the reference pictures refers to the dependency ininter-picture motion compensated prediction among the referencepictures/frames. Next, a motion prediction process is performed for acurrent picture (e.g., a block of image samples) using at least thethird list of reference pictures in step 2514. For example, the motionprediction process may be performed using only the third list, usingonly the second and third lists, or using the first and third lists.

If the first list does not match the second list in step 2506, a motionprediction process is performed for the current picture using the firstand second lists of reference frames in step 2510.

In an embodiment, the logic at step 2506 may also be switched. Inparticular, if the first list matches the second list, a motionprediction process is performed for the current picture using the firstand second lists of reference frames in step 2510. On the other hand, ifthe first list does not match the second list in step 2506, a comparisonis performed to determine or judge if the value of the flag has or is ofa predefined value. If the flag is of a predefined value, a third listof reference pictures (e.g., corresponding to the first list ofreference pictures described in the second exemplary embodiment), whichis sorted based on the temporal level of the reference pictures, iscreated in step 2512. If the flag is not of a predefined value, a thirdlist of reference pictures, which is sorted by prediction dependency ofthe reference pictures, is created in step 2516. The predictiondependency of the reference pictures refers to the dependency ininter-picture motion compensated prediction among the reference frames.Next, a motion prediction process is performed for a current picture(e.g., a block of image samples) using at least the third list ofreference pictures in step 2514. For example, the motion predictionprocess may be performed using only the third list, the second and thirdlists, or using the first and third lists.

Yet another embodiment is shown in FIG. 25B. In particular, steps 2506and 2510 shown in FIG. 25A are omitted. Accordingly, after step 2504, acomparison is performed to determine or judge if the value of the flaghas or is of a predefined value in step 2558. If the flag is of apredefined value, a third list of reference pictures (e.g.,corresponding to the first list of reference pictures described in thesecond exemplary embodiment), which is sorted based on the temporallevel of the reference pictures, is created in step 2562. If the flag isnot of a predefined value, a first list of reference pictures, which issorted by prediction dependency of the reference pictures, is created instep 2566. The prediction dependency of the reference pictures refers tothe dependency in inter-picture motion compensated prediction among thereference frames. Next, a motion prediction process is performed for acurrent picture (e.g., a block of image samples) using at least thethird list of reference pictures in step 2564.

Embodiment 6

The processing described in each of Embodiments can be simplyimplemented in an independent computer system, by recording, in arecording medium, a program for implementing the configurations of thevideo coding method and the video decoding method described in each ofEmbodiments. The recording media may be any recording media as long asthe program can be recorded, such as a magnetic disk, an optical disk, amagnetic optical disk, an IC card, and a semiconductor memory.

Hereinafter, the applications to the video coding method and the videodecoding method described in each of Embodiments and systems usingthereof will be described.

FIG. 26 illustrates an overall configuration of a content providingsystem ex100 for implementing content distribution services. The areafor providing communication services is divided into cells of desiredsize, and base stations ex106, ex107, ex108, ex109, and ex110 which arefixed wireless stations are placed in each of the cells.

The content providing system ex100 is connected to devices, such as acomputer ex111, a personal digital assistant (PDA) ex112, a cameraex113, a cellular phone ex114 and a game machine ex115, via the Internetex101, an Internet service provider ex102, a telephone network ex104, aswell as the base stations ex106 to ex110, respectively.

However, the configuration of the content providing system ex100 is notlimited to the configuration shown in FIG. 26, and a combination inwhich any of the elements are connected is acceptable. In addition, eachdevice may be directly connected to the telephone network ex104, ratherthan via the base stations ex106 to ex110 which are the fixed wirelessstations. Furthermore, the devices may be interconnected to each othervia a short distance wireless communication and others.

The camera ex113, such as a digital video camera, is capable ofcapturing video. A camera ex116, such as a digital video camera, iscapable of capturing both still images and video. Furthermore, thecellular phone ex114 may be the one that meets any of the standards suchas Global System for Mobile Communications (GSM), Code Division MultipleAccess (CDMA), Wideband-Code Division Multiple Access (W-CDMA), LongTerm Evolution (LTE), and High Speed Packet Access (HSPA).Alternatively, the cellular phone ex114 may be a Personal HandyphoneSystem (PHS).

In the content providing system ex100, a streaming server ex103 isconnected to the camera ex113 and others via the telephone network ex104and the base station ex109, which enables distribution of images of alive show and others. In such a distribution, a content (for example,video of a music live show) captured by the user using the camera ex113is coded as described above in each of Embodiments, and the codedcontent is transmitted to the streaming server ex103. On the other hand,the streaming server ex103 carries out stream distribution of thetransmitted content data to the clients upon their requests. The clientsinclude the computer ex111, the PDA ex112, the camera ex113, thecellular phone ex114, and the game machine ex115 that are capable ofdecoding the above-mentioned coded data. Each of the devices that havereceived the distributed data decodes and reproduces the coded data.

The captured data may be coded by the camera ex113 or the streamingserver ex103 that transmits the data, or the coding processes may beshared between the camera ex113 and the streaming server ex103.Similarly, the distributed data may be decoded by the clients or thestreaming server ex103, or the decoding processes may be shared betweenthe clients and the streaming server ex103. Furthermore, the data of thestill images and video captured by not only the camera ex113 but alsothe camera ex116 may be transmitted to the streaming server ex103through the computer ex111. The coding processes may be performed by thecamera ex116, the computer ex111, or the streaming server ex103, orshared among them.

Furthermore, the coding and decoding processes may be performed by anLSI ex500 generally included in each of the computer ex111 and thedevices. The LSI ex500 may be configured of a single chip or a pluralityof chips. Software for coding and decoding video may be integrated intosome type of a recording medium (such as a CD-ROM, a flexible disk, anda hard disk) that is readable by the computer ex111 and others, and thecoding and decoding processes may be performed using the software.Furthermore, when the cellular phone ex114 is equipped with a camera,the image data obtained by the camera may be transmitted. The video datais data coded by the LSI ex500 included in the cellular phone ex114.

Furthermore, the streaming server ex103 may be composed of servers andcomputers, and may decentralize data and process the decentralized data,record, or distribute data.

As described above, the clients may receive and reproduce the coded datain the content providing system ex100. In other words, the clients canreceive and decode information transmitted by the user, and reproducethe decoded data in real time in the content providing system ex100, sothat the user who does not have any particular right and equipment canimplement personal broadcasting.

Aside from the example of the content providing system ex100, at leastone of the video coding apparatus and the video decoding apparatusdescribed in each of Embodiments may be implemented in a digitalbroadcasting system ex200 illustrated in FIG. 27. More specifically, abroadcast station ex201 communicates or transmits, via radio waves to abroadcast satellite ex202, multiplexed data obtained by multiplexingaudio data and others onto video data. The video data is data coded bythe video coding method described in each of Embodiments. Upon receiptof the multiplexed data, the broadcast satellite ex202 transmits radiowaves for broadcasting. Then, a home-use antenna ex204 with a satellitebroadcast reception function receives the radio waves.

Next, a device such as a television (receiver) ex300 and a set top box(STB) ex217 decodes the received multiplexed data, and reproduces thedecoded data.

Furthermore, a reader/recorder ex218 (i) reads and decodes themultiplexed data recorded on a recording media ex215, such as a DVD anda BD, or (i) codes video signals in the recording medium ex215, and insome cases, writes data obtained by multiplexing an audio signal on thecoded data. The reader/recorder ex218 can include the video decodingapparatus or the video coding apparatus as shown in each of Embodiments.In this case, the reproduced video signals are displayed on the monitorex219, and can be reproduced by another device or system using therecording medium ex215 on which the multiplexed data is recorded. It isalso possible to implement the video decoding apparatus in the set topbox ex217 connected to the cable ex203 for a cable television or to theantenna ex204 for satellite and/or terrestrial broadcasting, so as todisplay the video signals on the monitor ex219 of the television ex300.The video decoding apparatus may be implemented not in the set top boxbut in the television ex300.

FIG. 28 illustrates the television (receiver) ex300 that uses the videocoding method and the video decoding method described in each ofEmbodiments. The television ex300 includes: a tuner ex301 that obtainsor provides multiplexed data obtained by multiplexing audio data ontovideo data, through the antenna ex204 or the cable ex203, etc. thatreceives a broadcast; a modulation/demodulation unit ex302 thatdemodulates the received multiplexed data or modulates data intomultiplexed data to be supplied outside; and amultiplexing/demultiplexing unit ex303 that demultiplexes the modulatedmultiplexed data into video data and audio data, or multiplexes videodata and audio data coded by a signal processing unit ex306 into data.

The television ex300 further includes: a signal processing unit ex306including an audio signal processing unit ex304 and a video signalprocessing unit ex305 that decode audio data and video data and codeaudio data and video data, respectively; and an output unit ex309including a speaker ex307 that provides the decoded audio signal, and adisplay unit ex308 that displays the decoded video signal, such as adisplay. Furthermore, the television ex300 includes an interface unitex317 including an operation input unit ex312 that receives an input ofa user operation. Furthermore, the television ex300 includes a controlunit ex310 that controls overall each constituent element of thetelevision ex300, and a power supply circuit unit ex311 that suppliespower to each of the elements. Other than the operation input unitex312, the interface unit ex317 may include: a bridge ex313 that isconnected to an external device, such as the reader/recorder ex218; aslot unit ex314 for enabling attachment of the recording medium ex216,such as an SD card; a driver ex315 to be connected to an externalrecording medium, such as a hard disk; and a modem ex316 to be connectedto a telephone network. Here, the recording medium ex216 canelectrically record information using a non-volatile/volatilesemiconductor memory element for storage. The constituent elements ofthe television ex300 are connected to each other through a synchronousbus.

First, the configuration in which the television ex300 decodesmultiplexed data obtained from outside through the antenna ex204 andothers and reproduces the decoded data will be described. In thetelevision ex300, upon a user operation through a remote controllerex220 and others, the multiplexing/demultiplexing unit ex303demultiplexes the multiplexed data demodulated by themodulation/demodulation unit ex302, under control of the control unitex310 including a CPU. Furthermore, the audio signal processing unitex304 decodes the demultiplexed audio data, and the video signalprocessing unit ex305 decodes the demultiplexed video data, using thedecoding method described in each of Embodiments, in the televisionex300. The output unit ex309 provides the decoded video signal and audiosignal outside, respectively. When the output unit ex309 provides thevideo signal and the audio signal, the signals may be temporarily storedin buffers ex318 and ex319, and others so that the signals arereproduced in synchronization with each other. Furthermore, thetelevision ex300 may read multiplexed data not through a broadcast andothers but from the recording media ex215 and ex216, such as a magneticdisk, an optical disk, and a SD card. Next, a configuration in which thetelevision ex300 codes an audio signal and a video signal, and transmitsthe data outside or writes the data on a recording medium will bedescribed. In the television ex300, upon a user operation through theremote controller ex220 and others, the audio signal processing unitex304 codes an audio signal, and the video signal processing unit ex305codes a video signal, under control of the control unit ex310 using thecoding method described in each of Embodiments. Themultiplexing/demultiplexing unit ex303 multiplexes the coded videosignal and audio signal, and provides the resulting signal outside. Whenthe multiplexing/demultiplexing unit ex303 multiplexes the video signaland the audio signal, the signals may be temporarily stored in thebuffers ex320 and ex321, and others so that the signals are reproducedin synchronization with each other. Here, the buffers ex318, ex319,ex320, and ex321 may be plural as illustrated, or at least one buffermay be shared in the television ex300. Furthermore, data may be storedin a buffer so that the system overflow and underflow may be avoidedbetween the modulation/demodulation unit ex302 and themultiplexing/demultiplexing unit ex303, for example.

Furthermore, the television ex300 may include a configuration forreceiving an AV input from a microphone or a camera other than theconfiguration for obtaining audio and video data from a broadcast or arecording medium, and may code the obtained data. Although thetelevision ex300 can code, multiplex, and provide outside data in thedescription, it may be capable of only receiving, decoding, andproviding outside data but not the coding, multiplexing, and providingoutside data.

Furthermore, when the reader/recorder ex218 reads or writes multiplexeddata from or on a recording medium, one of the television ex300 and thereader/recorder ex218 may decode or code the multiplexed data, and thetelevision ex300 and the reader/recorder ex218 may share the decoding orcoding.

As an example, FIG. 29 illustrates a configuration of an informationreproducing/recording unit ex400 when data is read or written from or onan optical disk. The information reproducing/recording unit ex400includes constituent elements ex401, ex402, ex403, ex404, ex405, ex406,and ex407 to be described hereinafter. The optical head ex401 irradiatesa laser spot in a recording surface of the recording medium ex215 thatis an optical disk to write information, and detects reflected lightfrom the recording surface of the recording medium ex215 to read theinformation. The modulation recording unit ex402 electrically drives asemiconductor laser included in the optical head ex401, and modulatesthe laser light according to recorded data. The reproductiondemodulating unit ex403 amplifies a reproduction signal obtained byelectrically detecting the reflected light from the recording surfaceusing a photo detector included in the optical head ex401, anddemodulates the reproduction signal by separating a signal componentrecorded on the recording medium ex215 to reproduce the necessaryinformation. The buffer ex404 temporarily holds the information to berecorded on the recording medium ex215 and the information reproducedfrom the recording medium ex215. The disk motor ex405 rotates therecording medium ex215. The servo control unit ex406 moves the opticalhead ex401 to a predetermined information track while controlling therotation drive of the disk motor ex405 so as to follow the laser spot.The system control unit ex407 controls overall the informationreproducing/recording unit ex400. The reading and writing processes canbe implemented by the system control unit ex407 using variousinformation stored in the buffer ex404 and generating and adding newinformation as necessary, and by the modulation recording unit ex402,the reproduction demodulating unit ex403, and the servo control unitex406 that record and reproduce information through the optical headex401 while being operated in a coordinated manner. The system controlunit ex407 includes, for example, a microprocessor, and executesprocessing by causing a computer to execute a program for read andwrite.

Although the optical head ex401 irradiates a laser spot in thedescription, it may perform high-density recording using near fieldlight.

FIG. 30 illustrates the recording medium ex215 that is the optical disk.On the recording surface of the recording medium ex215, guide groovesare spirally formed, and an information track ex230 records, in advance,address information indicating an absolute position on the diskaccording to change in a shape of the guide grooves. The addressinformation includes information for determining positions of recordingblocks ex231 that are a unit for recording data. Reproducing theinformation track ex230 and reading the address information in anapparatus that records and reproduces data can lead to determination ofthe positions of the recording blocks. Furthermore, the recording mediumex215 includes a data recording area ex233, an inner circumference areaex232, and an outer circumference area ex234. The data recording areaex233 is an area for use in recording the user data. The innercircumference area ex232 and the outer circumference area ex234 that areinside and outside of the data recording area ex233, respectively arefor specific use except for recording the user data. The informationreproducing/recording unit 400 reads and writes coded audio, coded videodata, or multiplexed data obtained by multiplexing the coded audio andvideo data, from and on the data recording area ex233 of the recordingmedium ex215. Although an optical disk having a layer, such as a DVD anda BD is described as an example in the description, the optical disk isnot limited to such, and may be an optical disk having a multilayerstructure and capable of being recorded on a part other than thesurface. Furthermore, the optical disk may have a structure formultidimensional recording/reproduction, such as recording ofinformation using light of colors with different wavelengths in the sameportion of the optical disk and for recording information havingdifferent layers from various angles.

Furthermore, a car ex210 having an antenna ex205 can receive data fromthe satellite ex202 and others, and reproduce video on a display devicesuch as a car navigation system ex211 set in the car ex210, in thedigital broadcasting system ex200. Here, a configuration of the carnavigation system ex211 will be a configuration, for example, includinga GPS receiving unit from the configuration illustrated in FIG. 28. Thesame will be true for the configuration of the computer ex111, thecellular phone ex114, and others.

FIG. 31A illustrates the cellular phone ex114 that uses the video codingmethod and the video decoding method described in Embodiments. Thecellular phone ex114 includes: an antenna ex350 for transmitting andreceiving radio waves through the base station ex110; a camera unitex365 capable of capturing moving and still images; and a display unitex358 such as a liquid crystal display for displaying the data such asdecoded video captured by the camera unit ex365 or received by theantenna ex350. The cellular phone ex114 further includes: a main bodyunit including an operation key unit ex366; an audio output unit ex357such as a speaker for output of audio; an audio input unit ex356 such asa microphone for input of audio; a memory unit ex367 for storingcaptured video or still pictures, recorded audio, coded or decoded dataof the received video, the still pictures, e-mails, or others; and aslot unit ex364 that is an interface unit for a recording medium thatstores data in the same manner as the memory unit ex367.

Next, an example of a configuration of the cellular phone ex114 will bedescribed with reference to FIG. 31B. In the cellular phone ex114, amain control unit ex360 designed to control overall each unit of themain body including the display unit ex358 as well as the operation keyunit ex366 is connected mutually, via a synchronous bus ex370, to apower supply circuit unit ex361, an operation input control unit ex362,a video signal processing unit ex355, a camera interface unit ex363, aliquid crystal display (LCD) control unit ex359, amodulation/demodulation unit ex352, a multiplexing/demultiplexing unitex353, an audio signal processing unit ex354, the slot unit ex364, andthe memory unit ex367.

When a call-end key or a power key is turned ON by a user's operation,the power supply circuit unit ex361 supplies the respective units withpower from a battery pack so as to activate the cell phone ex114.

In the cellular phone ex114, the audio signal processing unit ex354converts the audio signals collected by the audio input unit ex356 invoice conversation mode into digital audio signals under the control ofthe main control unit ex360 including a CPU, ROM, and RAM. Then, themodulation/demodulation unit ex352 performs spread spectrum processingon the digital audio signals, and the transmitting and receiving unitex351 performs digital-to-analog conversion and frequency conversion onthe data, so as to transmit the resulting data via the antenna ex350.

Also, in the cellular phone ex114, the transmitting and receiving unitex351 amplifies the data received by the antenna ex350 in voiceconversation mode and performs frequency conversion and theanalog-to-digital conversion on the data. Then, themodulation/demodulation unit ex352 performs inverse spread spectrumprocessing on the data, and the audio signal processing unit ex354converts it into analog audio signals, so as to output them via theaudio output unit ex356.

Furthermore, when an e-mail in data communication mode is transmitted,text data of the e-mail inputted by operating the operation key unitex366 and others of the main body is sent out to the main control unitex360 via the operation input control unit ex362. The main control unitex360 causes the modulation/demodulation unit ex352 to perform spreadspectrum processing on the text data, and the transmitting and receivingunit ex351 performs the digital-to-analog conversion and the frequencyconversion on the resulting data to transmit the data to the basestation ex110 via the antenna ex350. When an e-mail is received,processing that is approximately inverse to the processing fortransmitting an e-mail is performed on the received data, and theresulting data is provided to the display unit ex358.

When video, still images, or video and audio in data communication modeis or are transmitted, the video signal processing unit ex355 compressesand codes video signals supplied from the camera unit ex365 using thevideo coding method shown in each of Embodiments, and transmits thecoded video data to the multiplexing/demultiplexing unit ex353. Incontrast, during when the camera unit ex365 captures video, stillimages, and others, the audio signal processing unit ex354 codes audiosignals collected by the audio input unit ex356, and transmits the codedaudio data to the multiplexing/demultiplexing unit ex353.

The multiplexing/demultiplexing unit ex353 multiplexes the coded videodata supplied from the video signal processing unit ex355 and the codedaudio data supplied from the audio signal processing unit ex354, using apredetermined method.

Then, the modulation/demodulation unit ex352 performs spread spectrumprocessing on the multiplexed data, and the transmitting and receivingunit ex351 performs digital-to-analog conversion and frequencyconversion on the data so as to transmit the resulting data via theantenna ex350.

When receiving data of a video file which is linked to a Web page andothers in data communication mode or when receiving an e-mail with videoand/or audio attached, in order to decode the multiplexed data receivedvia the antenna ex350, the multiplexing/demultiplexing unit ex353demultiplexes the multiplexed data into a video data bit stream and anaudio data bit stream, and supplies the video signal processing unitex355 with the coded video data and the audio signal processing unitex354 with the coded audio data, through the synchronous bus ex370. Thevideo signal processing unit ex355 decodes the video signal using avideo decoding method corresponding to the coding method shown in eachof Embodiments, and then the display unit ex358 displays, for instance,the video and still images included in the video file linked to the Webpage via the LCD control unit ex359. Furthermore, the audio signalprocessing unit ex354 decodes the audio signal, and the audio outputunit ex357 provides the audio.

Furthermore, similarly to the television ex300, a terminal such as thecellular phone ex114 probably have 3 types of implementationconfigurations including not only (i) a transmitting and receivingterminal including both a coding apparatus and a decoding apparatus, butalso (ii) a transmitting terminal including only a coding apparatus and(iii) a receiving terminal including only a decoding apparatus. Althoughthe digital broadcasting system ex200 receives and transmits themultiplexed data obtained by multiplexing audio data onto video data inthe description, the multiplexed data may be data obtained bymultiplexing not audio data but character data related to video ontovideo data, and may be not multiplexed data but video data itself.

As such, the video coding method and the video decoding method in eachof Embodiments can be used in any of the devices and systems described.Thus, the advantages described in each of Embodiments can be obtained.

Furthermore, the present invention is not limited to Embodiments, andvarious modifications and revisions are possible without departing fromthe scope of the present invention.

Embodiment 7

Video data can be generated by switching, as necessary, between (i) thevideo coding method or the video coding apparatus shown in each ofEmbodiments and (ii) a video coding method or a video coding apparatusin conformity with a different standard, such as MPEG-2, MPEG4-AVC, andVC-1.

Here, when a plurality of video data that conforms to the differentstandards is generated and is then decoded, the decoding methods need tobe selected to conform to the different standards. However, since towhich standard each of the plurality of the video data to be decodedconform cannot be detected, there is a problem that an appropriatedecoding method cannot be selected.

In order to solve the problem, multiplexed data obtained by multiplexingaudio data and others onto video data has a structure includingidentification information indicating to which standard the video dataconforms. The specific structure of the multiplexed data including thevideo data generated in the video coding method and by the video codingapparatus shown in each of Embodiments will be hereinafter described.The multiplexed data is a digital stream in the MPEG2-Transport Streamformat.

FIG. 32 illustrates a structure of the multiplexed data. As illustratedin FIG. 32, the multiplexed data can be obtained by multiplexing atleast one of a video stream, an audio stream, a presentation graphicsstream (PG), and an interactive graphics stream. The video streamrepresents primary video and secondary video of a movie, the audiostream (IG) represents a primary audio part and a secondary audio partto be mixed with the primary audio part, and the presentation graphicsstream represents subtitles of the movie. Here, the primary video isnormal video to be displayed on a screen, and the secondary video isvideo to be displayed on a smaller window in the primary video.Furthermore, the interactive graphics stream represents an interactivescreen to be generated by arranging the GUI components on a screen. Thevideo stream is coded in the video coding method or by the video codingapparatus shown in each of Embodiments, or in a video coding method orby a video coding apparatus in conformity with a conventional standard,such as MPEG-2, MPEG4-AVC, and VC-1. The audio stream is coded inaccordance with a standard, such as Dolby-AC-3, Dolby Digital Plus, MLP,DTS, DTS-HD, and linear PCM.

Each stream included in the multiplexed data is identified by PID. Forexample, 0x1011 is allocated to the video stream to be used for video ofa movie, 0x1100 to 0x111F are allocated to the audio streams, 0x1200 to0x121F are allocated to the presentation graphics streams, 0x1400 to0x141F are allocated to the interactive graphics streams, 0x1B00 to0x1B1F are allocated to the video streams to be used for secondary videoof the movie, and 0x1A00 to 0x1A1F are allocated to the audio streams tobe used for the secondary video to be mixed with the primary audio.

FIG. 33 schematically illustrates how data is multiplexed. First, avideo stream ex235 composed of video frames and an audio stream ex238composed of audio frames are transformed into a stream of PES packetsex236 and a stream of PES packets ex239, and further into TS packetsex237 and TS packets ex240, respectively. Similarly, data of apresentation graphics stream ex241 and data of an interactive graphicsstream ex244 are transformed into a stream of PES packets ex242 and astream of PES packets ex245, and further into TS packets ex243 and TSpackets ex246, respectively. These TS packets are multiplexed into astream to obtain multiplexed data ex247.

FIG. 34 illustrates how a video stream is stored in a stream of PESpackets in more detail. The first bar in FIG. 34 shows a video framestream in a video stream. The second bar shows the stream of PESpackets. As indicated by arrows denoted as yy1, yy2, yy3, and yy4 inFIG. 34, the video stream is divided into pictures as I pictures, Bpictures, and P pictures each of which is a video presentation unit, andthe pictures are stored in a payload of each of the PES packets. Each ofthe PES packets has a PES header, and the PES header stores aPresentation Time-Stamp (PTS) indicating a display time of the picture,and a Decoding Time-Stamp (DTS) indicating a decoding time of thepicture.

FIG. 35 illustrates a format of TS packets to be finally written on themultiplexed data. Each of the TS packets is a 188-byte fixed lengthpacket including a 4-byte TS header having information, such as a PIDfor identifying a stream and a 184-byte TS payload for storing data. ThePES packets are divided, and stored in the TS payloads, respectively.When a BD ROM is used, each of the TS packets is given a 4-byteTP_Extra_Header, thus resulting in 192-byte source packets. The sourcepackets are written on the multiplexed data. The TP_Extra_Header storesinformation such as an Arrival_Time_Stamp (ATS). The ATS shows atransfer start time at which each of the TS packets is to be transferredto a PID filter. The source packets are arranged in the multiplexed dataas shown at the bottom of FIG. 35. The numbers incrementing from thehead of the multiplexed data are called source packet numbers (SPNs).

Each of the TS packets included in the multiplexed data includes notonly streams of audio, video, subtitles and others, but also a ProgramAssociation Table (PAT), a Program Map Table (PMT), and a Program ClockReference (PCR). The PAT shows what a PID in a PMT used in themultiplexed data indicates, and a PID of the PAT itself is registered aszero. The PMT stores PIDs of the streams of video, audio, subtitles andothers included in the multiplexed data, and attribute information ofthe streams corresponding to the PIDs. The PMT also has variousdescriptors relating to the multiplexed data. The descriptors haveinformation such as copy control information showing whether copying ofthe multiplexed data is permitted or not. The PCR stores STC timeinformation corresponding to an ATS showing when the PCR packet istransferred to a decoder, in order to achieve synchronization between anArrival Time Clock (ATC) that is a time axis of ATSs, and an System TimeClock (STC) that is a time axis of PTSs and DTSs.

FIG. 36 illustrates the data structure of the PMT in detail. A PMTheader is disposed at the top of the PMT. The PMT header describes thelength of data included in the PMT and others. A plurality ofdescriptors relating to the multiplexed data is disposed after the PMTheader. Information such as the copy control information is described inthe descriptors. After the descriptors, a plurality of pieces of streaminformation relating to the streams included in the multiplexed data isdisposed. Each piece of stream information includes stream descriptorseach describing information, such as a stream type for identifying acompression codec of a stream, a stream PID, and stream attributeinformation (such as a frame rate or an aspect ratio). The streamdescriptors are equal in number to the number of streams in themultiplexed data.

When the multiplexed data is recorded on a recording medium and others,it is recorded together with multiplexed data information files.

Each of the multiplexed data information files is management informationof the multiplexed data as shown in FIG. 37. The multiplexed datainformation files are in one to one correspondence with the multiplexeddata, and each of the files includes multiplexed data information,stream attribute information, and an entry map.

As illustrated in FIG. 37, the multiplexed data includes a system rate,a reproduction start time, and a reproduction end time. The system rateindicates the maximum transfer rate at which a system target decoder tobe described later transfers the multiplexed data to a PID filter. Theintervals of the ATSs included in the multiplexed data are set to nothigher than a system rate. The reproduction start time indicates a PTSin a video frame at the head of the multiplexed data. An interval of oneframe is added to a PTS in a video frame at the end of the multiplexeddata, and the PTS is set to the reproduction end time.

As shown in FIG. 38, a piece of attribute information is registered inthe stream attribute information, for each PID of each stream includedin the multiplexed data. Each piece of attribute information hasdifferent information depending on whether the corresponding stream is avideo stream, an audio stream, a presentation graphics stream, or aninteractive graphics stream. Each piece of video stream attributeinformation carries information including what kind of compression codecis used for compressing the video stream, and the resolution, aspectratio and frame rate of the pieces of picture data that is included inthe video stream. Each piece of audio stream attribute informationcarries information including what kind of compression codec is used forcompressing the audio stream, how many channels are included in theaudio stream, which language the audio stream supports, and how high thesampling frequency is. The video stream attribute information and theaudio stream attribute information are used for initialization of adecoder before the player plays back the information.

In Embodiment 7, the multiplexed data to be used is of a stream typeincluded in the PMT. Furthermore, when the multiplexed data is recordedon a recording medium, the video stream attribute information includedin the multiplexed data information is used. More specifically, thevideo coding method or the video coding apparatus described in each ofEmbodiments includes a step or a unit for allocating unique informationindicating video data generated by the video coding method or the videocoding apparatus in each of Embodiments, to the stream type included inthe PMT or the video stream attribute information. With theconfiguration, the video data generated by the video coding method orthe video coding apparatus described in each of Embodiments can bedistinguished from video data that conforms to another standard.

Furthermore, FIG. 39 illustrates steps of the video decoding methodaccording to Embodiment 7. In Step exS100, the stream type included inthe PMT or the video stream attribute information is obtained from themultiplexed data. Next, in Step exS101, it is determined whether or notthe stream type or the video stream attribute information indicates thatthe multiplexed data is generated by the video coding method or thevideo coding apparatus in each of Embodiments. When it is determinedthat the stream type or the video stream attribute information indicatesthat the multiplexed data is generated by the video coding method or thevideo coding apparatus in each of Embodiments, in Step exS102, decodingis performed by the video decoding method in each of Embodiments.Furthermore, when the stream type or the video stream attributeinformation indicates conformance to the conventional standards, such asMPEG-2, MPEG4-AVC, and VC-1, in Step exS103, decoding is performed by avideo decoding method in conformity with the conventional standards.

As such, allocating a new unique value to the stream type or the videostream attribute information enables determination whether or not thevideo decoding method or the video decoding apparatus that is describedin each of Embodiments can perform decoding. Even when multiplexed datathat conforms to a different standard, an appropriate decoding method orapparatus can be selected. Thus, it becomes possible to decodeinformation without any error. Furthermore, the video coding method orapparatus, or the video decoding method or apparatus in Embodiment 7 canbe used in the devices and systems described above.

Embodiment 8

Each of the video coding method, the video coding apparatus, the videodecoding method, and the video decoding apparatus in each of Embodimentsis typically achieved in the form of an integrated circuit or a LargeScale Integrated (LSI) circuit. As an example of the LSI, FIG. 40illustrates a configuration of the LSI ex500 that is made into one chip.The LSI ex500 includes elements ex501, ex502, ex503, ex504, ex505,ex506, ex507, ex508, and ex509 to be described below, and the elementsare connected to each other through a bus ex510. The power supplycircuit unit ex505 is activated by supplying each of the elements withpower when the power supply circuit unit ex505 is turned on.

For example, when coding is performed, the LSI ex500 receives an AVsignal from a microphone ex117, a camera ex113, and others through an AVIO ex509 under control of a control unit ex501 including a CPU ex502, amemory controller ex503, a stream controller ex504, and a drivingfrequency control unit ex512. The received AV signal is temporarilystored in an external memory ex511, such as an SDRAM. Under control ofthe control unit ex501, the stored data is segmented into data portionsaccording to the processing amount and speed to be transmitted to asignal processing unit ex507. Then, the signal processing unit ex507codes an audio signal and/or a video signal. Here, the coding of thevideo signal is the coding described in each of Embodiments.Furthermore, the signal processing unit ex507 sometimes multiplexes thecoded audio data and the coded video data, and a stream IO ex506provides the multiplexed data outside. The provided multiplexed data istransmitted to the base station ex107, or written on the recording mediaex215. When data sets are multiplexed, the data should be temporarilystored in the buffer ex508 so that the data sets are synchronized witheach other.

Although the memory ex511 is an element outside the LSI ex500, it may beincluded in the LSI ex500. The buffer ex508 is not limited to onebuffer, but may be composed of buffers. Furthermore, the LSI ex500 maybe made into one chip or a plurality of chips.

Furthermore, although the control unit ex510 includes the CPU ex502, thememory controller ex503, the stream controller ex504, the drivingfrequency control unit ex512, the configuration of the control unitex510 is not limited to such. For example, the signal processing unitex507 may further include a CPU. Inclusion of another CPU in the signalprocessing unit ex507 can improve the processing speed. Furthermore, asanother example, the CPU ex502 may serve as or be a part of the signalprocessing unit ex507, and, for example, may include an audio signalprocessing unit. In such a case, the control unit ex501 includes thesignal processing unit ex507 or the CPU ex502 including a part of thesignal processing unit ex507.

The name used here is LSI, but it may also be called IC, system LSI,super LSI, or ultra LSI depending on the degree of integration.

Moreover, ways to achieve integration are not limited to the LSI, and aspecial circuit or a general purpose processor and so forth can alsoachieve the integration. Field Programmable Gate Array (FPGA) that canbe programmed after manufacturing LSIs or a reconfigurable processorthat allows re-configuration of the connection or configuration of anLSI can be used for the same purpose.

In the future, with advancement in semiconductor technology, a brand-newtechnology may replace LSI. The functional blocks can be integratedusing such a technology. The possibility is that the present inventionis applied to biotechnology.

Embodiment 9

When video data generated in the video coding method or by the videocoding apparatus described in each of Embodiments is decoded, comparedto when video data that conforms to a conventional standard, such asMPEG-2, MPEG4-AVC, and VC-1 is decoded, the processing amount probablyincreases. Thus, the LSI ex500 needs to be set to a driving frequencyhigher than that of the CPU ex502 to be used when video data inconformity with the conventional standard is decoded. However, when thedriving frequency is set higher, there is a problem that the powerconsumption increases.

In order to solve the problem, the video decoding apparatus, such as thetelevision ex300 and the LSI ex500 is configured to determine to whichstandard the video data conforms, and switch between the drivingfrequencies according to the determined standard. FIG. 41 illustrates aconfiguration ex800 in Embodiment 9. A driving frequency switching unitex803 sets a driving frequency to a higher driving frequency when videodata is generated by the video coding method or the video codingapparatus described in each of Embodiments. Then, the driving frequencyswitching unit ex803 instructs a decoding processing unit ex801 thatexecutes the video decoding method described in each of Embodiments todecode the video data. When the video data conforms to the conventionalstandard, the driving frequency switching unit ex803 sets a drivingfrequency to a lower driving frequency than that of the video datagenerated by the video coding method or the video coding apparatusdescribed in each of Embodiments. Then, the driving frequency switchingunit ex803 instructs the decoding processing unit ex802 that conforms tothe conventional standard to decode the video data.

More specifically, the driving frequency switching unit ex803 includesthe CPU ex502 and the driving frequency control unit ex512 in FIG. 40.Here, each of the decoding processing unit ex801 that executes the videodecoding method described in each of Embodiments and the decodingprocessing unit ex802 that conforms to the conventional standardcorresponds to the signal processing unit ex507 in FIG. 38. The CPUex502 determines to which standard the video data conforms. Then, thedriving frequency control unit ex512 determines a driving frequencybased on a signal from the CPU ex502. Furthermore, the signal processingunit ex507 decodes the video data based on the signal from the CPUex502. For example, the identification information described inEmbodiment 7 is probably used for identifying the video data. Theidentification information is not limited to the one described inEmbodiment 7 but may be any information as long as the informationindicates to which standard the video data conforms. For example, whenwhich standard video data conforms to can be determined based on anexternal signal for determining that the video data is used for atelevision or a disk, etc., the determination may be made based on suchan external signal. Furthermore, the CPU ex502 selects a drivingfrequency based on, for example, a look-up table in which the standardsof the video data are associated with the driving frequencies as shownin FIG. 43. The driving frequency can be selected by storing the look-uptable in the buffer ex508 and in an internal memory of an LSI, and withreference to the look-up table by the CPU ex502.

FIG. 42 illustrates steps for executing a method in Embodiment 9. First,in Step exS200, the signal processing unit ex507 obtains identificationinformation from the multiplexed data. Next, in Step exS201, the CPUex502 determines whether or not the video data is generated by thecoding method and the coding apparatus described in each of Embodiments,based on the identification information. When the video data isgenerated by the video coding method and the video coding apparatusdescribed in each of Embodiments, in Step exS202, the CPU ex502transmits a signal for setting the driving frequency to a higher drivingfrequency to the driving frequency control unit ex512. Then, the drivingfrequency control unit ex512 sets the driving frequency to the higherdriving frequency. On the other hand, when the identificationinformation indicates that the video data conforms to the conventionalstandard, such as MPEG-2, MPEG4-AVC, and VC-1, in Step exS203, the CPUex502 transmits a signal for setting the driving frequency to a lowerdriving frequency to the driving frequency control unit ex512. Then, thedriving frequency control unit ex512 sets the driving frequency to thelower driving frequency than that in the case where the video data isgenerated by the video coding method and the video coding apparatusdescribed in each of Embodiment.

Furthermore, along with the switching of the driving frequencies, thepower conservation effect can be improved by changing the voltage to beapplied to the LSI ex500 or an apparatus including the LSI ex500. Forexample, when the driving frequency is set lower, the voltage to beapplied to the LSI ex500 or the apparatus including the LSI ex500 isprobably set to a voltage lower than that in the case where the drivingfrequency is set higher.

Furthermore, when the processing amount for decoding is larger, thedriving frequency may be set higher, and when the processing amount fordecoding is smaller, the driving frequency may be set lower as themethod for setting the driving frequency. Thus, the setting method isnot limited to the ones described above. For example, when theprocessing amount for decoding video data in conformity with MPEG4-AVCis larger than the processing amount for decoding video data generatedby the video coding method and the video coding apparatus described ineach of Embodiments, the driving frequency is probably set in reverseorder to the setting described above.

Furthermore, the method for setting the driving frequency is not limitedto the method for setting the driving frequency lower. For example, whenthe identification information indicates that the video data isgenerated by the video coding method and the video coding apparatusdescribed in each of Embodiments, the voltage to be applied to the LSIex500 or the apparatus including the LSI ex500 is probably set higher.When the identification information indicates that the video dataconforms to the conventional standard, such as MPEG-2, MPEG4-AVC, andVC-1, the voltage to be applied to the LSI ex500 or the apparatusincluding the LSI ex500 is probably set lower. As another example, whenthe identification information indicates that the video data isgenerated by the video coding method and the video coding apparatusdescribed in each of Embodiments, the driving of the CPU ex502 does notprobably have to be suspended. When the identification informationindicates that the video data conforms to the conventional standard,such as MPEG-2, MPEG4-AVC, and VC-1, the driving of the CPU ex502 isprobably suspended at a given time because the CPU ex502 has extraprocessing capacity. Even when the identification information indicatesthat the video data is generated by the video coding method and thevideo coding apparatus described in each of Embodiments, in the casewhere the CPU ex502 has extra processing capacity, the driving of theCPU ex502 is probably suspended at a given time. In such a case, thesuspending time is probably set shorter than that in the case where whenthe identification information indicates that the video data conforms tothe conventional standard, such as MPEG-2, MPEG4-AVC, and VC-1.

Accordingly, the power conservation effect can be improved by switchingbetween the driving frequencies in accordance with the standard to whichthe video data conforms. Furthermore, when the LSI ex500 or theapparatus including the LSI ex500 is driven using a battery, the batterylife can be extended with the power conservation effect.

Embodiment 10

There are cases where a plurality of video data that conforms todifferent standards, is provided to the devices and systems, such as atelevision and a mobile phone. In order to enable decoding the pluralityof video data that conforms to the different standards, the signalprocessing unit ex507 of the LSI ex500 needs to conform to the differentstandards. However, the problems of increase in the scale of the circuitof the LSI ex500 and increase in the cost arise with the individual useof the signal processing units ex507 that conform to the respectivestandards.

In order to solve the problem, what is conceived is a configuration inwhich the decoding processing unit for implementing the video decodingmethod described in each of Embodiments and the decoding processing unitthat conforms to the conventional standard, such as MPEG-2, MPEG4-AVC,and VC-1 are partly shared. Ex900 in FIG. 44A shows an example of theconfiguration. For example, the video decoding method described in eachof Embodiments and the video decoding method that conforms to MPEG4-AVChave, partly in common, the details of processing, such as entropycoding, inverse quantization, deblocking filtering, and motioncompensated prediction. The details of processing to be shared probablyincludes use of a decoding processing unit ex902 that conforms toMPEG4-AVC. In contrast, a dedicated decoding processing unit ex901 isprobably used for other processing unique to the present invention.Since the present invention is characterized by a transformation unit inparticular, for example, the dedicated decoding processing unit ex901 isused for inverse transform. Otherwise, the decoding processing unit isprobably shared for one of the entropy coding, inverse quantization,deblocking filtering, and motion compensated prediction, or all of theprocessing. The decoding processing unit for implementing the videodecoding method described in each of Embodiments may be shared for theprocessing to be shared, and a dedicated decoding processing unit may beused for processing unique to that of MPEG4-AVC.

Furthermore, ex1000 in FIG. 44B shows another example in that processingis partly shared. This example uses a configuration including adedicated decoding processing unit ex1001 that supports the processingunique to the present invention, a dedicated decoding processing unitex1002 that supports the processing unique to another conventionalstandard, and a decoding processing unit ex1003 that supports processingto be shared between the video decoding method in the present inventionand the conventional video decoding method. Here, the dedicated decodingprocessing units ex1001 and ex1002 are not necessarily specialized forthe processing of the present invention and the processing of theconventional standard, respectively, and may be the ones capable ofimplementing general processing. Furthermore, the configuration ofEmbodiment 10 can be implemented by the LSI ex500.

As such, reducing the scale of the circuit of an LSI and reducing thecost are possible by sharing the decoding processing unit for theprocessing to be shared between the video decoding method in the presentinvention and the video decoding method in conformity with theconventional standard.

INDUSTRIAL APPLICABILITY

The present invention is applicable to a coding apparatus which codesaudio, still images, and video and to a decoding apparatus which decodesdata coded by the coding apparatus. For example, the present inventionis applicable to various audio-visual devices such as audio devices,cellular phones, digital cameras, BD recorders, and digital televisions.

1-20. (canceled)
 21. A method of encoding video using a plurality ofreference pictures respectively associated with a plurality of indexes,the method comprising: creating a list including the plurality ofreference pictures, the plurality of reference pictures including (i) areference picture of a particular type of reference picture, thereference picture of the particular type being included in a first groupof reference pictures and (ii) a reference picture that is not of theparticular type of reference picture; and encoding a current picture ofthe video using the list, wherein the creating the list includes:selecting, from among the plurality of reference pictures, the firstgroup of reference pictures based on a parameter indicating a temporallevel of the reference picture of the particular type of referencepicture, the first group of reference pictures including the referencepicture of the particular type of reference picture, a frame ratecorresponding to the temporal level of each of the reference picturesincluded in the first group of reference pictures indicating a lowestframe rate from among frame rates of the plurality of referencepictures; selecting, from among the plurality of reference pictures, asecond group of reference pictures, the second group of referencepictures including the reference picture that is not of the particulartype of reference picture; placing the first group of reference picturesat a predetermined position in the list, the first group of referencepictures being selected based on the parameter indicating the temporallevel of the reference picture of the particular type of referencepicture, the temporal level of each of the references pictures includedin the first group of reference pictures being lower than the temporallevel of each of the reference pictures included in the second group ofreference pictures, a frame rate of the first group of referencepictures being lower than the frame rate of the second group ofreference pictures; and placing the second group of reference picturesafter the first group of reference pictures in the list, each of thereference pictures included in the first group of reference picturespreceding each of the reference pictures in the second referencepictures in the list.
 22. The method according to claim 21, wherein thereference picture that is not of the particular type of referencepicture is located closest to the current picture.
 23. A method ofdecoding video using a plurality of reference pictures respectivelyassociated with a plurality of indexes, the method comprising: creatinga list including the plurality of reference pictures, the plurality ofreference pictures including (i) a reference picture of a particulartype of reference picture, the reference picture of the particular typebeing included in a first group of reference pictures and (ii) areference picture that is not of the particular type of referencepicture; and decoding a current picture of the video using the list,wherein the creating the list includes: selecting, from among theplurality of reference pictures, the first group of reference picturesbased on a parameter indicating a temporal level of the referencepicture of the particular type of reference picture, the first group ofreference pictures including the reference picture of the particulartype of reference picture, a frame rate corresponding to the temporallevel of each of the reference pictures included in the first group ofreference pictures indicating a lowest frame rate from among frame ratesof the plurality of reference pictures; selecting, from among theplurality of reference pictures, a second group of reference pictures,the second group of reference pictures including the reference picturethat is not of the particular type of reference picture; placing thefirst group of reference pictures at a predetermined position in thelist, the first group of reference pictures being selected based on theparameter indicating the temporal level of the reference picture of theparticular type of reference picture, the temporal level of each of thereferences pictures included in the first group of reference picturesbeing lower than the temporal level of each of the reference picturesincluded in the second group of reference pictures, a frame rate of thefirst group of reference pictures being lower than the frame rate of thesecond group of reference pictures; and placing the second group ofreference pictures after the first group of reference pictures in thelist, each of the reference pictures included in the first group ofreference pictures preceding each of the reference pictures in thesecond reference pictures in the list.
 24. The method according to claim23, wherein the reference picture that is not of the particular type ofreference picture is located closest to the current picture.
 25. Anapparatus for encoding video using a plurality of reference picturesrespectively associated with a plurality of indexes, the apparatuscomprising: a non-transitory memory; and a processor coupled to thenon-transitory memory, the processor performing: creating a listincluding the plurality of reference pictures, the plurality ofreference pictures including (i) a reference picture of a particulartype of reference picture, the reference picture of the particular typebeing included in a first group of reference pictures and (ii) areference picture that is not of the particular type of referencepicture; and encoding a current picture of the video using the list,wherein the creating the list includes: selecting, from among theplurality of reference pictures, the first group of reference picturesbased on a parameter indicating a temporal level of the referencepicture of the particular type of reference picture, the first group ofreference pictures including the reference picture of the particulartype of reference picture, a frame rate corresponding to the temporallevel of each of the reference pictures included in the first group ofreference pictures indicating a lowest frame rate from among frame ratesof the plurality of reference pictures; selecting, from among theplurality of reference pictures, a second group of reference pictures,the second group of reference pictures including the reference picturethat is not of the particular type of reference picture; placing thefirst group of reference pictures at a predetermined position in thelist, the first group of reference pictures being selected based on theparameter indicating the temporal level of the reference picture of theparticular type of reference picture, the temporal level of each of thereferences pictures included in the first group of reference picturesbeing lower than the temporal level of each of the reference picturesincluded in the second group of reference pictures, a frame rate of thefirst group of reference pictures being lower than the frame rate of thesecond group of reference pictures; and placing the second group ofreference pictures after the first group of reference pictures in thelist, each of the reference pictures included in the first group ofreference pictures preceding each of the reference pictures in thesecond reference pictures in the list.
 26. The apparatus according toclaim 25, wherein the reference picture that is not of the particulartype of reference picture is located closest to the current picture. 27.An apparatus for decoding video using a plurality of reference picturesrespectively associated with a plurality of indexes, the apparatuscomprising: a non-transitory memory; and a processor coupled to thenon-transitory memory, the processor performing: creating a listincluding the plurality of reference pictures, the plurality ofreference pictures including (i) a reference picture of a particulartype of reference picture, the reference picture of the particular typebeing included in a first group of reference pictures and (ii) areference picture that is not of the particular type of referencepicture; and decoding a current picture of the video using the list,wherein the creating the list includes: selecting, from among theplurality of reference pictures, the first group of reference picturesbased on a parameter indicating a temporal level of the referencepicture of the particular type of reference picture, the first group ofreference pictures including the reference picture of the particulartype of reference picture, a frame rate corresponding to the temporallevel of each of the reference pictures included in the first group ofreference pictures indicating a lowest frame rate from among frame ratesof the plurality of reference pictures; selecting, from among theplurality of reference pictures, a second group of reference pictures,the second group of reference pictures including the reference picturethat is not of the particular type of reference picture; placing thefirst group of reference pictures at a predetermined position in thelist, the first group of reference pictures being selected based on theparameter indicating the temporal level of the reference picture of theparticular type of reference picture, the temporal level of each of thereferences pictures included in the first group of reference picturesbeing lower than the temporal level of each of the reference picturesincluded in the second group of reference pictures, a frame rate of thefirst group of reference pictures being lower than the frame rate of thesecond group of reference pictures; and placing the second group ofreference pictures after the first group of reference pictures in thelist, each of the reference pictures included in the first group ofreference pictures preceding each of the reference pictures in thesecond reference pictures in the list.
 28. The apparatus according toclaim 27, wherein the reference picture that is not of the particulartype of reference picture is located closest to the current picture.